TAML Reference Documentation

TAML is a configuration file format combining some aspects of Markdown, CSV, TOML, YAML and Rust.

As configuration language, TAML’s main design goals are to be:

  • Human-writeable

  • Human-readable

  • Unambiguous and Debuggable

  • Computer-readable

One central feature is that it uses headings rather than indentation or far-spanning nested brackets to denote complex data structures. Another is the relatively strong distinction between data types.

In addition to this, implementations of the file format should make it easy to make it easy for software end(!) users to learn about and correct mistakes in configuration files. A number of error codes and descriptions are documented here which will ideally be largely shared between implementations.

Please refer to the table of contents to the left for examples and details.

Note

TAML is explicitly not a data transfer format.

Most notably, it is not streamable, as repeated non-list fields are not valid and must lead to a parsing error before any data becomes effective. This includes unrelated preceding data in the same document.

TAML by Example

Hint

This is a non-normative quickstart guide.

For the thorough format reference, see grammar_reference.

The most simple TAML document is empty:


While this is valid TAML, such a configuration file will usually be unsupported in practice, as fields are not always optional.

Key-Value Pairs

To define structural fields (including in the implicit top-level context), you can write them as key-value pairs as follows:

// This is a comment. The parser will ignore it.
a_string: "This is Unicode text. You can escape \\ and \"."
some_data: <Some-Encoding:This is a data literal. You can escape \\ and \>.>
an_integer: 5
negative: -0
decimal: 0.0
negative_decimal: -10.0
list: ("Inline lists may contain heterogeneous data but no line breaks.", 1, 2.0, ())

`You can quote identifiers and escape \\ and \` within.`: ()

Note

Integers and decimals are disjoint! If a decimal value is expected, write 1.0 instead of 1.

Tabular List

Lists may also be written as tables with a single column:

# [[items]]
"This is a list in tabular form."

1
2
3
4
5

"This is still part of the list."

Lists continue up to the next heading or end of document. They may not contain nested sections.

Each row in the table creates a new column in the list assigned to the items field, while empty lines and lines with only a comment are ignored.

Enums

Enum values are written as variant identifier, optionally followed by a list:

unit_variant: Unit
empty_variant: Empty()
newtype_variant: SameAsBefore("This is a nested value.")
tuple_variant: Tuple(1, 2.0, 3, 4, 5)

Hint

Booleans are normally treated as enumeration type with the unit variants false and true.

Hint

Nullability, as for example expressed by the Option<…> type in Rust, should translate to optional fields rather explicit enum values.

Structural Variant

Structural variants can’t be expressed inline. Instead, :Variant is attached to the respective field name in a heading:

# a_field:AVariant
a: ()
b: ()

Structural Section

Complex data structures can be represented in TAML as follows:

top_level_field: ()

# outer_structural_field
inner_field: ()

## inner_structural_field
deeply_nested: ()

#
another_top_level_field: ()

This is equivalent to the following JSON:

{
        "top_level_field": [],
        "outer_structural_field": {
                "inner_field": [],
                "inner_structural_field": {
                        "deeply_nested": []
                }
        },
        "another_top_level_field": []
}

Structures in Lists

Structure headings create list items whenever identifiers are wrapped in square brackets ([…]):

# [items]
a: 1
b: 2

# [items]
a: 3
b: 4
c: 5

equals

"items": [
        {
                "a": 1,
                "b": 2
        },
        {
                "a": 3,
                "b": 4,
                "c": 5
        }
]

Note

Fields that are defined twice are normally invalid. However, adding items to an existing list is possible as above.

Path Heading

The following are equivalent:

# a
## [b]
### c
d: 1
e: 2

## f
### g
#### [h]
##### [[j]]
1
2
3
4
5

# k
## l
### m
### n

// Illegal, would redefine `a`:
// # a
// ## o
# a
## [b].c
d: 1
e: 2

## f.g.[h].[[j]]
1
2
3
4
5

# k.l
## m
## n

// Illegal, would redefine `a`:
// # a.o

Multi-Column Table

The following are equivalent:

# [a]
b: 1
## [c]
## d
e: 2
f: 3
##
g: 4

# [a]
b: 5
## [[c]]
6
7
## d
e: 8
f: 9
##
g: 10
# [[a].{b, c, d.{e, f}, g}]
1, (), 2, 3, 4
5, (6, 7), 8, 9, 10

Hint

I don’t recommend manually aligning table cells here, as some people (including me) use proportional fonts almost everywhere.

(taml fmt would undo it by default, too.)

Hint

You can write .{} in a table heading to assign an empty structure to a field in each row.

Or as JSON:

{
        "a": [
                {
                        "b": 1,
                        "c": [],
                        "d": {
                                "e": 2,
                                "f": 3
                        },
                        "g": 4
                },
                {
                        "b": 5,
                        "c": [
                                6,
                                7
                        ],
                        "d": {
                                "e": 8,
                                "f": 9
                        },
                        "g": 10
                }
        ]
}

TAML Grammar Reference

Hint

This page is aimed at format support implementors.

For a user manual (even when using a TAML library as developer), see TAML by Example.

TK: Use singular for headings.

All grammar is defined in terms of Unicode codepoint identity.

Where available, the canonical binary or at-rest encoding of TAML is UTF-8, while its runtime text-API representation should use the canonical representation of arbitrary Unicode strings in the target ecosystem.

Note

Where no standard Unicode text representation exists, it’s likely best to provide only a binary UTF-8 API.

Whitespace

Note

TK: Format as regex section

[ \t]+

Whitespace is meaningless except when separating otherwise-joined tokens.

Note that line breaks are not included here.

Comment

Note

TK: Format as regex section

//[^\r\n]+

At (nearly) any point in the document, a line comment can be written as follows:

// This is a comment. It stretches for the rest of the line.
// This is another comment.

The only limitation to comment placement is that the line up to that point must be otherwise complete.

Line break

Note

TK: Format as regex section

\r?\n

TAML does not use commas to delineate values, outside of inline lists and rows.

Instead, line breaks are a grammar token that separates comments, headings, key-value pairs and table rows.

Note

“Line break” more specifically refers to Unicode code point U+000A LINE FEED (LF), which can optionally be prefixed with a single U+000D CARRIAGE RETURN (CR).

This is the only position in which verbatim carriage return characters are legal. Note that occurrences of the line feed character in quotes are not considered to be a line break token! Correct the literal in question by either replacing all verbatim carriage return characters with \r or deleting them.

Empty lines outside of quotes and lines containing only a comment always can be removed without changing the structure or contents of the document.

Hint

taml fmt preserves single empty lines but collapses longer blank parts of the document.

taml fix can fix your line endings for you without changing the meaning of quotes. (TODO) It warns about any occurrence of the character it doesn’t fix by default, in either sense. (TODO)

Identifier

Note

TK: Format as regex section

[a-zA-Z_][a-zA-Z\-_0-9]*
`([^\\`\r]|\\\\|\\`|\\r)*`

Identifiers in TAML are arbitrary Unicode strings and can appear in two forms, verbatim and quoted:

Verbatim

Verbatim identifiers must start with an ASCII-letter or underscore (_). They may contain only those codepoints plus ASCII digits and the hypen-minus character (-).

Hint

Support for - is a compatibility affordance.

When outlining a new configuration structure, I recommend for example a_b over a-b, as the former is treated as single “word” by most text editors. (Try double-clicking each.)

Quoted

Backtick (`)-quoted identifiers are parsed as completely arbitrary Unicode strings.

Only the following characters are backlash-escaped:

  • \ as \\

  • ` as \`

All other sequences starting with a backslash are invalid in quoted strings and must lead to an error.

Warning

Identifiers formally may be empty or contain U+0000 NULL.

However, parsers for ecosystems where this cannot be safely supported are free to limit support here, as long as this limitation is prominently declared.

(A parser written in for example C# or Rust very much should support both, though. A parser written in C or C++ should consider not supporting NULL due to its common special meaning.)

TK: Define an error code that should be used here. Something like TAML-L0001?

Key

Only identifiers may be keys. Keys appear in section headers, enum variants and as part of key-value pairs like the following:

key: value

(value is a unit variant here, but could be replaced with any other value.)

Value

A value is any one of the following:

`data literal`_, decimal, `enum variant`_, integer, list, string, struct_.

Warning

TAML processors should be as strict as at all sensible regarding value types. For example, if a string is expected, don’t accept an integer and vice versa.

In some cases, remapping TAML value types is a good idea, like when parsing rust_decimal values using Serde, which should still be written as decimals in TAML but internally processed as strings. Such remappings should be done explicitly on a case-by-case basis.

Integer

Note

TK: Format as regex section

-?(0|[1-9]\d*)

A whole number with base 10. Note that -0 is legal and may be interpreted differently from 0.

Additional leading zeroes are disallowed to avoid confusion with languages and/or parsing systems where this would denote base 8.

Hint

If your configuration requires setting a bitfield, consider accepting it as data literal e.g. like this instead:

some_bitfield: <bits:1000_0001 1111_0000>
another_encoding: <hex:81 F0>

Decimal

Note

TK: Format as regex section

-?(0|[1-9]\d*)\.\d+

A fractional base 10 number. Note that -0 is legal and may be interpreted differently from 0.

Additional leading zeroes are disallowed for consistency with integers. Additional trailing zeroes are considered idempotent and must not make a difference when parsing a value.

Note

Integers and decimals should be considered disjoint. Don’t accept one for the other unless not doing so would be unusually inconvenient.

Note

Decimals, like integers, are not required to fit any particular binary representation.

For example, they could be parsed and processed with arbitrary precision rather than as IEEE 754 float.

Warning

taml fmt removes idempotent trailing zeroes from decimals.

serde_taml excludes them while lexing, which also affects reserde.

Absolutely do not make any distinction regarding additional trailing zeroes in decimals when writing a lexer or parser.

String

Note

TK: Format as regex section

"([^\\"\r]|\\\\|\\"|\\r)*"

Strings are written as quoted Unicode literals. The characters \, " and U+000D CARRIAGE RETURN (CR) must be escaped as \\, \" and \r, respectively.

The character U+0000 NULL may be unsupported in environments where processing it would be unreasonably error-prone.

Enum Variants

TK

Unit Variant

Unit variants are written as single identifiers.

Notable unit variants are the boolean values true and false, which are not associated with more specific grammar in TAML.

List

TK

Inline Lists

Sections

TAML’s grammar is, roughly speaking, split into three contexts:

  • structural sections

  • headings

  • tabular sections

Structural Sections

The initial context is a structural section. Structural sections can contain key-value pairs and nested sections, which can be structural sections.

first: 1
second: 2

# third
first: 3.1
second: 3.2

Each nested section is introduced by a heading nested exactly one deeper than the surrounding section’s.

It continues until a heading with at most equal depth is encountered or up to the end of the file. An empty nested heading can be used to semantically (but not grammatically!) return to its immediately surrounding structural section.

first: 1
second: 2

# third
first: 3.1
second: 3.2

## third
first: "3.3.1"
second: "3.3.2"

## fourth
first: "3.4.1"
second: "3.4.2"

#
fourth: 4

Headings

Tabular Sections

Tabular sections are a special shorthand to quickly define lists with structured content.

The following are equivalent:

# [[dishes].{id, name, [price].{currency, amount}]
<luid:d6fce69d-9c9d>, "A", EUR, 10.95
<luid:c37dcc6a-2002>, "B", EUR, 5.50
<luid:00000000-0000>, "Test Item", EUR, 0.0
# [dishes]
id: <luid:d6fce69d-9c9d>
name: "A"
## price
currency: EUR
amount: 10.95

# [dishes]
id: <luid:c37dcc6a-2002>
name: "B"
## price
currency: EUR
amount: 5.50

# [dishes]
id: <luid:00000000-0000>
name: "Test Item"
## price
currency: EUR
amount: 0.0

Hint

As of right now, there is intentionally no way to define common values once per table.

I haven’t found a way to express this that both is intuitive and won’t make copy/paste errors much more likely.

Row

TK

TAML Diagnostics

Formatting TAML

As a general rule, TAML code in this documentation follows the recommended formatting rules and would stay unchanged if taml fmt was used on it. Exceptions are explicitly noted.