To Quote or not to Quote?
This article covers scalar styles in YAML 1.1 and 1.2. It mostly works the same in both versions.
One design goal of YAML was that it's human friendly. It should be easy to read and edit, even if that makes parsing it harder.
Let's look at strings, specifically.
If you look at JSON, you have only one style to encode strings, and that's the double quoted style which doesn't allow literal linebreaks.
YAML is a data serialization language, but YAML files are used for many different purposes, and there are many types of strings, especially multiline strings. For each use case, you can choose the type of quoting (or no quoting) that makes the string readable and easy to edit.
This gives you lots of freedom, but you also have to learn using it to avoid mistakes.
You basically have five ways to express a string:
- Flow Scalars (plain, single quoted, double quoted)
- Block Scalars (literal, folded)
Quick comparison - tl;dr
--- # Flow Scalar Styles plain scalars: - a string with a \ backslash that doesn't need to be escaped - can also use " quotes ' and $ a % lot /&?+ of other {} [] stuff - a string on multiple lines single quoted: - '& starts with a special character, needs quotes' - 'no need to escape backslash \ and double " quote' - 'to express one single quote, use '' two of them' double quoted: - "here we can use predefined escape sequences like \t \n \b" - "or generic escape sequences \x0b \u0041 \U00000041" - "the double quote \" needs to be escaped" - "just like the \\ backslash" - "the single quote ' and other characters must not be escaped" # Block Scalar Styles literal block scalar: | a multiline text line 2 line 3 # mnemonic: '>' is a folded '|' folded block scalar: > a long line split into several short lines for readability
Flow Scalars
The three types of Flow Scalars have some common rules.
They can be on multiple lines, and can start on the same line as the parent node.
Linebreaks are subject to flow folding:
--- single: simple scalar on one line --- multi: this is all one single line same as: this is all one single line --- multi: this is all one single line same as: this is all one single line
This can be very useful if you have a long string and want to limit the length of the lines in your YAML file.
Whitespace at the beginning or end of line are ignored:
flow: a b c d e same as: a b c d e
Whitespace inside a line are kept:
flow: a b c d e same as: "a b\tc d e"
You should avoid using literal tabs, especially in plain scalars.
There's also a way to enforce newlines. If you add a blank line, it will not be folded:
multi: a b c d single: "a b\nc d"
Every following empty line after the first will be kept as a newline, too:
multi: a b c d single: "a b\n\nc d"
Plain Scalars
In YAML, you can write a string without quotes, if it doesn't have a special meaning. See the next section for cases where you have to quote a string.
a string: no quotes needed another string: with 'single' and "double" quotes a url: http://example.org/
You can use literal tabs, backslashes and unicode characters:
a string: with a real tab character and a \ backslash
But note that literal tabs are discouraged, as there are edge cases, and they are usually not easy to see.
You can not use escapes sequences like \n or \t here, it will be returned literally as "Backslash + n" / "Backslash + t".
A comment will end such a plain scalar, so the following example is invalid:
multi: first # a comment second # this is invalid
You can only use a comment at the end:
multi: first second # a comment
It should be noted that, while a plain scalar cannot start with a -<space>, for example, the following lines can, although this might look like a badly indented sequence:
- a multiline - plain string # same as - "a multiline - plain string"
So you should avoid this.
When not to use Plain Scalars
Because a plain scalar without quotes can conflict with YAML syntax elements, there are some exceptions where you can not use it.
Characters that cannot be used at the beginning of a plain scalar:
- ! Tag like !!null
- & Anchor like &mapping_for_later_use
- * Alias like *mapping_for_later_use
- -<space> Block sequence entry
- :<space> Block mapping entry
- ?<space> Explicit mapping key
- {, }, [, ] Flow mapping or sequence
- , Flow Collection entry seperator
- # Comment
- |, > Block Scalar
- @, ` (backtick) Reserved characters
- ", ' Double and single quote
- <whitespace>
- % Directive
Examples:
- this @ is ok - @but this not
- comma, - ,no comma
Character sequences that can't be used inside a plain scalar:
- :<space> Key/value seperator. A colon is allowed, but only if it's not followed by whitespace
- <space># This starts a comment
Additional exceptions for scalars in Flow Style Collections:
flow style sequence: [ string one, string two ] flow style mapping: { key: value }
As you can see, a comma or a square bracket will end a plain scalar. Therefor, to avoid confusion, the following characters or character sequences are not allowed in plain scalars:
- [, ]
- {, }
- ,
- :[
- :]
- :{
- :}
- :,
A colon is an indicator for a mapping key if it is followed by one of these characters []{},:
# Some processors don't implement this correctly. To be # sure you should always add a space. flow mapping: {key:[sequence]}
The following example with a colon without a space is also valid in flow style collections, but some processors don't allow it (currently):
request: { url: http://example.org/ } urls: [http://example.org/, http://yaml.org/]
Finally, to be compatible with JSON, you also can omit the space if the key is quoted:
flow mapping: { "quoted":23 }
Special types
Another use case for quotes is when you have a string that would be resolved as a special type. This highly depends on the YAML version and on the Schema in use. Here are some examples where you need quotes:
- true, false
- 23
- 1e3
- 3.14159
- null
Single Quoted Scalar
In the last section you learned when you have to quote a scalar. Single quoted scalars mostly work like plain scalars, only that the special character sequences are allowed:
a string: '&enclosed in single quotes' colon plus space: 'this colon : would be forbidden without quotes' another colon plus space: 'this colon : would create a mapping without quotes' no comment: 'this would be # a comment without quotes' curly brace: '{ this would be: a flow style mapping }' square bracket: '[ this would be a flow style sequence ]' backslash: 'this \n is a backslash and "n", not a linebreak'
Any character except ' will be returned literally. You can not use escapes sequences here. The single quote itself is escaped by doubling it:
a string: 'with one single '' quote'
The following demonstrates that a backslash is not an escape character:
a string: 'this is \' # the end of the string'
In JSON, this would be:
{ "a string": "this is \\" }
So the # the end of the string' is really a comment.
Multiple lines
Folding rules are like in all flow scalars:
multi: 'a b c d' single: 'a b c d'
However, spaces after the starting quote or before the ending quote will be kept:
multi: ' a b c d ' single: " a b c d "
Double Quoted Scalar
A double quoted scalar has the same rules as a single quoted scalar, plus escape sequences. This is the only scalar style where you can use escape sequences.
The escaping rules are compatible to JSON. (I should note, though, that it also depends on the processor you use, since not all are fully JSON compatible. The incompatible cases should be rare, though.)
a string: "here's a \t tab and a \n newline, followed by a \\ backslash" another string: "with an escaped \" double quote"
It's important to note, that only a limited set of characters can be escaped. Other escapes are invalid:
- "invalid \. escape" - "invalid \' escape" - "invalid \- escape"
There are special escape sequences which let you express any character:
- "a \x20 space" - "a vertical \v tab can also be written as \x0B or \x0b" - "an 'A' in 8-bit unicode: \x41" - "an 'A' in 16-bit unicode: \u0041" - "an 'A' in 32-bit unicode: \U00000041"
The list of allowed escapes can be found here:
In YAML 1.1, escaping of a slash is forbidden. In 1.2, this was one of the changes made to be compatible with JSON:string: "escaped \/ slash"
The backslash also has an additional meaning. If you add it to the end of a line, the next line will be folded without a space. This is useful when you want to break a long string into several lines, but it doesn't have spaces anywhere:
a long string without spaces: "loooooooooooooooooooooooooooooooooooooooooooooooooongstring\ loooooooooooooooooooooooooooooooooooooooooooooooooongstring\ loooooooooooooooooooooooooooooooooooooooooooooooooongstring"
You can also use it to preserve spaces at the end:
multi: "the first line ends with 5 spaces \ second line" single: "the first line ends with 5 spaces second line"
In that case the five spaces are preserved and will be used for folding. You can use a Backslash plus Space at the beginning of the line to get a similar effect:
multi: "first \ 5 spaces third" single: "first 5 spaces third"
Note that you will actually get six spaces in this case!
Block Scalars
When your string is longer, it can be a good idea to use a block scalar to make it more readable.
An advantage is that inside of the block scalar any character sequence is allowed. It is ended by a less indented line so you can freely use :<space>, <space># and quotes.
You can choose between Literal and Folded Block Scalars.
You can not use escape sequences like \t for tabs.
Spaces at the beginning or end of the line will be preserved.
Literal Block Scalar
A Literal Block Scalar is introduced with the | pipe. The content starts on the next line and has to be indented:
literal: | line 1 line 2 end
The indendation is detected from the first (non-empty) line of the block scalar.
The newlines will be preserved, so this is equivalent to:
quoted: "line 1\n line 2\nend\n"
This way you can add all kinds of text to your YAML, for example a shell script:
bash: | #!/usr/bin/env bash echo "Help, I'm trapped in a YAML document!" exit 1
You could even embed a YAML document in YAML! If you ever had to do this in JSON, you know how ugly this can get.
Also trailing spaces will be preserved.
- | trailing spaces trailing tab - "trailing spaces \ntrailing tab\t\n"
Folded Block Scalar
The Folded Block Scalar, as the name suggests, will fold its lines with spaces. It is introduced with the > sign, which can be seen as a folded |.
a long command: > apt-get update && apt-get install -y git tig vim jq tmux tmate git-subrepo quoted: "apt-get update && apt-get install -y git tig vim jq tmux tmate git-subrepo\n"
The folding rules are actually almost the same as for quoted scalars. You can enforce a newline with an empty line:
a text with long lines: > this is the first long line and this is the second quoted: "this is the first long line\nand this is the second\n"
There's an additional way to enforce newlines, and probably not very well known:
a long text with enforced newlines: > line one line two line three line four quoted: "line one\n line two\n line three\nline four\n"
Another difference to flow folding is that trailing spaces are kept:
a text with long lines: > trailing spaces continued quoted: "trailing space continued\n"
Comments
If a line starting with # is indented correctly, it will not be interpreted as a comment:
literal: | a # no comment b quoted: "a\n# no comment\nb\n"
Also note that even the first line can start with a #:
folded: > # no comment a b quoted: "# no comment a b\n"
A less indented line starting with a # will be interpreted as a comment and will also end the block scalar:
folded: > a b # a comment, end of block scalar
You can add comments to a block scalar directly after the header:
literal: | # a block scalar abc def folded: > # a block scalar abc def
Leading Empty Lines
Unlike trailing empty lines, at the beginning they will be preserved. Note that lines containing only spaces count as empty lines here. An underscore "_" is used to represent the spaces:
folded: > __ a b quoted: "\n\na b\n"
Chomp Indicator
You might have noticed that Block Scalars always end with a newline. This is the default behaviour. Any further trailing newlines will be stripped:
literal: | a b quoted: "a\nb\n" folded: > a b quoted: "a b\n"
If you don't want to end your scalar with a newline, you can use the - chomping indicator:
literal: |- a b quoted: "a\nb" folded: >- a b quoted: "a b"
If you want to keep all trailing newlines, use the "keep" + indicator:
literal: |+ a b quoted: "a\nb\n\n\n" folded: >+ a b quoted: "a b\n\n\n"
Block Scalar Indenting
Sometimes, your block scalar might start with one or multiple spaces that you want to preserve:
literal: | # invalid! This Is A Header The body starts here
All continuation lines in a block scalar have to be indented at least as much as the first line. So how can you preserve the spaces in the first line? By specifying the number of indentation spaces in the block scalar header:
implicit: | # indentation is 1 line explicit: |2 This Is A Header The body starts here quoted: " This is A Header\nThe body starts here\n"
This tells the YAML processor that the indentation is 2. Note that the number must be greater than zero. You can also combine the indicators, and the order does not matter:
literal: |-2 header body quoted: " header\nbody"
Document Header and Footer
A special note about the Document Headers and Footers.
In YAML, ---<space> or ---<linebreak> at the beginning of a line explicitly starts the document.
...<space> or ...<linebreak> ends a document.
Even inside of Block Scalars or Quoted Scalars they still have their special meaning.
If your YAML document consistes of only one string at the top level, you should still indent it, because otherwise it might break your content if it contains --- or ....
--- "valid --- scalar"
--- "valid ... scalar"
--- > block scalar --- more block scalar