What is YAML?
YAML is a computer data serialization language.
A YAML document represents a computer program's native data structure in a human readable text form. A node in a YAML document can have three basic data types:
- Scalar
Atomic data types like strings, numbers, booleans and null - Sequence
A list of nodes - Mapping
A map of nodes to nodes. Also known as Hashes, Hash Maps, Dictionaries or Objects.
Unlike in many programming languages, a key can be more than just a string.
It can be a sequence or mapping itself.
On top of that, YAML allows to serialize all other data types and classes:
- Alias and Anchor
For serializing References / Pointers, including circular references. - Tag
With Tags it's possible to define custom types/classes.
For example, in many languages a Regular Expression is a builtin data type or object.
Some languages have only arrays, which are represented by the basic sequence type.
But some have tuples, which needs a custom tag.
Additionally to the indentation based Block Style there is a more compact Flow Style syntax.
One YAML File (or Stream) can consist of more than one Document.
Tutorial
The following examples will introduce you with YAML syntax elements step by step.
Invoice
Let's write an invoice.
It has a number, a name and an address, order items and more.
Mapping
The most common top level data type are mappings. A mapping maps values to keys.Keys and values are separated with a colon and a space : .
Each Key/Value pair is on its own line.
invoice number: 314159 name: Santa Claus address: North Pole
An alternative way to write it:
--- invoice number: 314159 name: Santa Claus address: North Pole
The --- is explicity starting a Document.
It marks the following content as YAML, but it is optional.
It has some use cases, and it is needed when you have multiple Documents in one file.
Read more about it in the Document Chapter.
Nested Mappings
Now we replace the address string with another mapping. In that case the colon is followed by a linebreak. Mapping values that are not scalars must always start on a new line.
Nested items must always be indented more then the parent node, with at least one space. The typical indentation is two spaces.
Tabs are forbidden as indentation.
invoice number: 314159 name: Santa Claus address: street: Santa Claus Lane zip: 12345 city: North Pole
Don't forget the indentation. If you write it like this:
invoice number: 314159 name: Santa Claus address: street: Santa Claus Lane zip: 12345 city: North Pole
... then it will actually mean this:
invoice number: 314159 name: Santa Claus address: null street: Santa Claus Lane zip: 12345 city: North Pole
Sequence
A sequence is a list (or array) of scalars (or other sequences or
mappings).
A sequence item starts with a hyphen and a space - .
Here is the list of
YAML inventors:
- Oren Ben-Kiki - Clark Evans - Ingy döt Net
Now back to our invoice.
We map a list of scalars to the key order items.
The sequence must start on the next line:
invoice number: 314159 name: Santa Claus address: street: Santa Claus Lane zip: 12345 city: North Pole order items: - Sled - Wrapping Paper
Because the - counts as indentation, you can also write it like this:
invoice number: 314159 name: Santa Claus address: street: Santa Claus Lane zip: 12345 city: North Pole order items: - Sled - Wrapping Paper
Nested Sequences
You can also nest sequences. The typical example is a List of Dice Rolls.
The nested sequence items can follow directly on the same line:
--- - - 2 - 3 - - 3 - 6
YAML allows to write that in a more compact way, the Flow Style:
--- - [ 2, 3 ] - [ 3, 6 ]
Read more about it in the Flow Style Chapter.
Aliases / Anchors
Let's add a billing address to the invoice.
In our case it is the same as the shipping address. We rename address to shipping address and add billing address:
invoice number: 314159 name: Santa Claus shipping address: street: Santa Claus Lane zip: 12345 city: North Pole billing address: street: Santa Claus Lane zip: 12345 city: North Pole order items: - Sled - Wrapping Paper
Now that's a bit wasted space. If it's the same address, you don't need to repeat it. Use an Alias.
In the native data structure of a programming language, this would be a reference, pointer, or alias.
Before an Alias can be used, it has to be created with an Anchor:
invoice number: 314159 name: Santa Claus shipping address: &address # Anchor street: Santa Claus Lane # ┐ zip: 12345 # │ Anchor content city: North Pole # ┘ billing address: *address # Alias order items: - Sled - Wrapping Paper
When loaded into a native data structure, the shipping address and
billing address point to the same data structure.
It depends on the capabilities of the programming language how this is
implemented internally.
(Link to Alias chapter)
Configuration Management
YAML is used in all kinds of applications as a configuration language.
Continuous Integration
One category is the configuration of Continuous Integration systems.
Here is a minimal example of a GitHub Action Workflow.
name: Linux on: [push] # Compact Flow Style Sequence jobs: build: name: Run Tests runs-on: ubuntu-latest steps: - name: Say Hello run: echo hello
The value for steps is a list of mappings. A mapping can start directly on the same line as the -.
Usually a step has a name, which will be shown as the title when running the job, and a run, which is a shell command, or multiple commands.
Let's add a more realistic scenario, with one step to checkout the code, and one with multiple commands.
If you use Double Quotes, which work like JSON strings, it looks like this:
steps: # Plugin provided by GitHub to checkout the code - uses: actions/checkout@v2 # Run multiple commands - name: Run Tests run: "./configure\nmake\nmake test\n"
One of the advantages of YAML here is that this can be formatted in a way that's easy to write and read with Block Scalars:
steps: - uses: actions/checkout@v2 - name: Run Tests run: | # Literal Block Scalar ./configure make make test
The Literal Block Scalar, as the name says, contains the literal content of the string. Tabs and similar characters are always literal. All trailing spaces will be kept.
Let's say, you have a number of longer commands that you would like to break up into multiple lines for readability:
steps: - uses: actions/checkout@v2 - name: Install dependencies run: > # Folded Block Scalar apt-get update && apt-get install -y git tig vim jq tmux tmate git-subrepo cpanminus cpanm -n -l local YAML::PP YAML::XS ...
The Folded Block Scalar is like the Literal Block Scalar, but with special folding rules.
Consecutive lines starting at the same indentation level will be folded with spaces, and empty lines create a linebreak.
Read more about Block Scalars and all other ways of quoting in the Quoting Chapter.
Variables
YAML itself has no concept of "variables" or "functions".
Systems like GitHub Actions usually provide a way to access certain information and environment variables with a Templating Syntax.
We set up a "matrix" test to build the code with gcc and clang.
strategy: matrix: compiler: [gcc, clang] steps: - ...
The strategy.matrix entry will create two jobs instead of one, providing the compiler in a "context" item that we can pass as an environment variable to the step:
strategy: matrix: compiler: [gcc, clang] steps: - uses: actions/checkout@v2 - name: Run Tests env: CC: ${{ matrix.compiler }} run: | ./configure make make test
This sets the environment variable CC to gcc or clang, respectively.
The ${{ matrix.compiler }} syntax is not a special YAML syntax.
It is a simple plain scalar that could also have been written in quotes:
env: CC: '${{ matrix.compiler }}'
It's the GitHub Action application that recognizes such variables and replaces them with their content at runtime.
Such variables can look different, depending on the application.
For example, Ansible is using the Jinja2 templating engine, where variables look like this:
wuth_items: '{{ user.names }}'
It is important to add quotes here, because the { at the start actually
would start a Flow Style Mapping otherwise.
So it's clever that GitHub Actions chose the ${{ ... }} syntax, because the $ at the start is not special in YAML and doesn't need quotes.