Ansible: Simple yet powerful automation

5 Comments

Automatic provisioning of infrastructure as well as deployment is a cornerstone of DevOps. It brings the benefits of version control, reproducibility, and a central place to consolidate (executable) knowledge about infrastructure setups. Best known provisioning systems are Chef and Puppet. A newcomer to this game is Ansible which we chose for setting up our next generation production environment for CenterDevice. This post is intended to give you an introduction into both Ansible itself as well as the things my colleague Lukas Pustina and I have learned while using it.

There are other places on the net comparing Ansible, Puppet, Chef and more solutions to the automation problem in depth. This article is not intended to be a shootout between these, but rather to give you an overview and some basic knowledge about the way Ansible attacks the problem.

Where do we want to go?

For our next generation production cluster we decided to start fresh with a better planned infrastructure as the basis for more reliable and maintainable testing and production systems. The current setup is not leveraging the available resources to their full potential. Using the Ceph object store and OpenStack Icehouse on a set of six of off-the-shelf servers, we plan to much better utilize compute power and storage capacities.

How do we get there?

One of several reasons we decided to evaluate Ansible was its independence of any sort of agent or client component on the systems to be managed. The only thing required is an SSH daemon running, which is true on practically any Linux server out of the box. This does not take care of the initial provisioning of the hardware after is has been mounted into the rack, though. Due to the relatively small number of machines we decided against pursuing an automated approach to that for the time being and set them up via network boot and initial manual configuration.

These manual steps ended with the creation of the identical dedicated administrative user on all machines. From there on, Ansible took over.

Note: All demo files shown below are available on Github if you would like to inspect them more closely.

Inventories

Ansible needs to know about the machines you intend to provision. This information is stored in inventories. An inventory is a simple text file that collects and groups DNS hostnames or IP addresses like this:

[controllers]
control01.baremetal
control02.baremetal
 
[nodes]
node01.baremetal
node02.baremetal
 
[baremetals:children]
controllers
nodes

Notice that there are two host groups called controllers and nodes with 2 members each. All nodes of both these groups are aggregated in another group called baremetals using the special children keyword.

Grouping machines like this provides a simple yet powerful organization of your infrastructure. There are even more interesting things inventories can do, but for now let us stick to this simple case.

Note: The examples below assume that these machines are reachable by these names via DNS and that a local user called local-admin exists and can connect to an SSH daemon without a password using an SSH authorized_keys file. Passwordless logins are not a requirement for Ansible, but make daily use easier.

Playbooks

Now that the target hardware has been inventoried and put into logical groups we can tell Ansible what steps it shall execute on one or more hosts to bring them into the desired state.

Where inventories organize infrastructure, playbooks — simple YAML files — organize the work to be executed on hosts or host groups into logical units. For example, let’s assume that we need to make sure that all machines in our cluster know about our organization-wide root CA certificate that is used to ultimately sign all kinds of TLS certificates, software packages and the like. So our first playbook needs to contain instructions how copy the required files to each host and update the root certificate database. Listing 1 shows this playbook called site.yml. Please note that site.yml is usually the “master” playbook which includes more specific ones. For this example, however, we will stick to it to keep things a little simpler.

---
hosts: baremetals
roles:
  - base

The hosts line refers to the host group defined in the inventory above. In effect this means that all controllers and all nodes will be targeted.

Now, what is this roles thing, and why is there no mention of certificates or databases?

While playbooks can indeed contain very concrete tasks Ansible executes on the corresponding machines, in the long run it pays off to organize tasks into more fine grained and re-usable units called roles. For this example, the role is called base and looks like this:

---
# Base Setup 

- name: Install CenterDevice Root CA Certificates
  sudo: true
  copy: src=usr/local/share/ca-certificates/{{ item }} dest=/usr/local/share/ca-certificates/{{ item }}
  with_items:
    - centerdevice-intermediate-ca.crt
    - centerdevice-root-ca.crt

- name: Update root certificate database
  sudo: true
  command: update-ca-certificates

As you see it follows a very simple and easy to read structure: For members of this role two tasks are executed during Ansible’s provisioning. Each one has a human readable name that identifies the task in log outputs and can also be used to refer to it directly. Names are not mandatory, but we made a habit of using clear and verbose names to convey what exactly the task is supposed to achieve at a glance. Let’s go through both tasks step by step.

The first task installs two CA certificate files into the appropriate destination directory. On Ubuntu this directory is /usr/local/share/ca-certificates/. As the destination path is not writable by everyone, we specify sudo: true to tell Ansible that for successful execution on the remote machine root privileges are required. The copy: line instructs Ansible to transfer a local file to the remote machine, specifying both local and remote paths. To make things a little more interesting, we are using a variable called item here to loop through a set of multiple files that all need to be copied.

This variable is filled with each value from the with_items: subsection in the next line.

That’s all there is to it. Ansible will check if the remote path exists and copy both files. Before doing so it verifies that the files are not yet present with the same content. In that case ignores the copy command. Most Ansible tasks support a wide range of parameters to fine-tune their behaviors. For example, we could have set file ownership and permissions right with the copy command.

Once the copying is finished successfully, the next task will is executed to process the newly copied files. To do so, it executes the update-ca-certificates command on the remote machine, again using sudo: true to run with the necessary permissions.

Running the first playbook

Now that we have an inventory and our first playbook and role, it is time to finally see some action.
Ansible uses convention over configuration, so we will put the files mentioned before into a directory structure like this on our local workstation:

ansible-demo-scripts
|-- inventories
|   \-- hosts.baremetal
|-- roles
|   \-- base
|       |-- files
|       |   \-- usr
|       |       \-- local
|       |           \-- share
|       |               \-- ca-certificates
|       |                   |-- centerdevice-intermediate-ca.crt
|       |                   \-- centerdevice-root-ca.crt
|       \-- tasks
|           \-- main.yml
\-- site.yml        

You will find the role name base reflected in the directory structure beneath the roles directory, as well as the source files to be copied over under the files directory for that role. The actual tasks for that role are stored in main.yml file in the tasks subdirectory. For more details on the directory structure conventions and recommendations, have a look at the excellent Ansible documentation on the topic.

With the ansible-demo-scripts directory being the current working directory, execute the ansible-playbook command:

ansible-demo-scripts$  ansible-playbook -u local-admin --ask-sudo-pass -i inventories/hosts.baremetal site.yml
sudo password: ******
 
PLAY [baremetals] ************************************************************* 
 
GATHERING FACTS *************************************************************** 
ok: [control02.baremetal]
ok: [node02.baremetal]
ok: [control01.baremetal]
ok: [node01.baremetal]
 
TASK: [base | Install CenterDevice Root CA Certificates] ********************** 
changed: [control01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [control02.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-root-ca.crt)
changed: [control01.baremetal] => (item=centerdevice-root-ca.crt)
changed: [node01.baremetal] => (item=centerdevice-root-ca.crt)
changed: [control02.baremetal] => (item=centerdevice-root-ca.crt)
 
TASK: [base | Update root certificate database] ******************************* 
changed: [control02.baremetal]
changed: [node01.baremetal]
changed: [control01.baremetal]
changed: [node02.baremetal]
 
PLAY RECAP ******************************************************************** 
control01.baremetal        : ok=3    changed=2    unreachable=0    failed=0
control02.baremetal        : ok=3    changed=2    unreachable=0    failed=0
node01.baremetal           : ok=3    changed=2    unreachable=0    failed=0
node02.baremetal           : ok=3    changed=2    unreachable=0    failed=0

The --ask-sudo-pass parameter is only needed when the local-admin user cannot use sudo without entering a password. Also, if your local ~/.ssh/config file specifies the remote username already, the -u parameter can also be omitted.

With -i (or the longer form --inventory-file) we specify the inventory file to use and lastly that we want to play the site.yml playbook. Ansible asks for the sudo password and goes on to carry out our wishes. Let’s go through the output.

First, we get a summary of what shall be executed: PLAY [baremetals] states that the host group baremetals is being targeted, just as we specified at the top of site.yml. Next up is the GATHERING FACTS section. Before executing any commands, Ansible collects a set of data about each machine it connects to. These facts includes the hostname, IP addresses, network interface names, the time zone, hardware information and a lot more. Have a look at the Ansible facts documentation to see the full picture. The information compiled in this step can be used as variables during playbook execution for more sophisticated behaviors.

You will notice that the ok: … lines in the fact gathering phase are out of order. This is due to the fact that Ansible connects to several hosts simultaneously (5 by default) to speed up execution. Depending on how quickly each machine responds, the log order can change.

Once the facts have been gathered, the actual tasks defined in our base.yml role are executed. Again, you can see that the order of the log depends on the responsiveness of the remote machines. As two files get copied over to each of the 4 machines, there are 8 lines in total logged for this task.
The fact that Ansible prefixes each line with changed: means that all machines received the file mentioned later in the line either because it was not present at all or had different contents.

Having completed the first task, Ansible goes on to the next, executing the certificate update command on each host in turn. As Ansible cannot know about the actual effects this generic command execution has on the remote system, it will prefix the output with changed: as well.

Once all tasks have been executed, you get a PLAY RECAP that provides an overview about the execution overall. In our case, the counters for all hosts are identical, because no tasks failed, all machines could be reached and they all had the same state to begin with.

Show me some smarts

Now let’s see what happens if we change the preconditions a little. After all, the purpose of automatic provisioning is to get machines into a specific state, but not necessarily starting with the same baseline. Before running the ansible-playbook command again, I replaced one of the certificate files on one of the nodes with a different file:

ansible-demo-scripts$  ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
 
PLAY [baremetals] ************************************************************* 
 
GATHERING FACTS *************************************************************** 
ok: [node01.baremetal]
ok: [control01.baremetal]
ok: [control02.baremetal]
ok: [node02.baremetal]
 
TASK: [base | Install CenterDevice Root CA Certificates] ********************** 
ok: [control02.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [control01.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [node01.baremetal] => (item=centerdevice-intermediate-ca.crt)
changed: [node02.baremetal] => (item=centerdevice-intermediate-ca.crt)
ok: [control02.baremetal] => (item=centerdevice-root-ca.crt)
ok: [control01.baremetal] => (item=centerdevice-root-ca.crt)
ok: [node01.baremetal] => (item=centerdevice-root-ca.crt)
ok: [node02.baremetal] => (item=centerdevice-root-ca.crt)
 
TASK: [base | Update root certificate database] ******************************* 
changed: [control02.baremetal]
changed: [control01.baremetal]
changed: [node01.baremetal]
changed: [node02.baremetal]
 
PLAY RECAP ******************************************************************** 
control01.baremetal        : ok=3    changed=1    unreachable=0    failed=0   
control02.baremetal        : ok=3    changed=1    unreachable=0    failed=0   
node01.baremetal           : ok=3    changed=1    unreachable=0    failed=0   
node02.baremetal           : ok=3    changed=2    unreachable=0    failed=0

The output shows that Ansible detected the changed file on node02 and replaced it with our pristine copy, while leaving the other files untouched. Again, as the generic command execution in the second task does not provide any information about its side-effects, it is executed for all machines again, resulting the same changed: status as before.

Variables and Templates

So far we copied static files to a remote machine and triggered remote command execution. Now let’s continue with a slightly more involved example.

On our nodes we want to have an NTP daemon installed to keep clocks in sync with specific time servers. In order to do so, we need to install the appropriate package, set up the configuration and make sure the daemon gets started on boot.

Assuming we have our own set of time servers, we could hardwire their addresses into the NTP config file. In the interest of easier maintenance and central configuration we, however, extract this information into a variable and let Ansible dynamically put it into the daemon configuration for us.

Variables in Ansible can have different scopes. They can be valid for a single host, a group of hosts or a whole site. In our example, since all baremetal machines shall be synchronized with the NTP servers, we will create a group scope variable definition.
Following the directory structure conventions introduced above, the variable definition file used for all our baremetal machines is be stored as baremetals.yml in the newly created ansible-demo-scripts/group_vars directory.

---
# Variables for all baremetal hosts
NTP_SERVERS:
  - 192.168.0.150
  - 192.168.0.151
  - 192.168.0.152

Now, in our base.yml role description we add the required steps:


- name: Install NTP daemon
  sudo: true
  apt: pkg=ntp state=present

- name: Ensure NTP daemon autostart
  sudo: true
  service: name=ntp enabled=yes

- name: Setup NTP daemon config
  sudo: true
  template: src=etc/ntpd.conf.j2 dest=/etc/ntpd.conf
  notify: Restart NTP daemon

The first task is quite simple: Make sure the ntp package is installed via the apt task. Then, in the next task, we make sure the service is set to automatically start when the system boots. Usually the package’s post-install script will take care of this, but we make sure just in case. Finally, in the last task, there are some more interesting things going on.

Similar to the copy task, the template task transfers a local file to the remote system. However, instead of simply carrying the file over verbatim, it evaluates it as a Jinja2 template, allowing us to dynamically construct the file based on facts and other variables.

As per the convention, Ansible looks for templates used in the base role in ansible-demo-scripts/roles/base/templates. So we create a file etc/ntp.conf.j2 there, which is a copy of a regular ntp.conf file, but with the following modification:

#
# removed all server lines referring to default ubuntu time servers here.
# these are our own servers:
#
{% for item in NTP_SERVERS %}
server {{ item }}
{% endfor %} 

This snippet will loop through all values we set up in the group variable file above for the NTP_SERVERS variable, effectively creating 3 new lines in the final configuration.

The last notify: Restart NTP daemon line of this template task tells Ansible to execute a special kind of action, a so called handler if, and only if, the corresponding template task actually changed the destination file from its previous contents. This is useful to only bump the service if its configuration has actually changed, making the playbook safe to run repeatedly without causing unnecessary service disruptions.

This handler is the last thing we need to define. It should be no surprise that it is stored in ansible-demo-scripts/roles/base/handlers/main.yml:

---
name: Restart NTP daemon
sudo: true
service: name=ntp state=restarted

Now, it is time to re-run our playbook:

ansible-demo-scripts$  ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
 
PLAY [baremetals] ************************************************************* 
 
GATHERING FACTS *************************************************************** 
…
 
TASK: [base | Install ntp daemon] ********************************************* 
changed: [control01.baremetal]
changed: [control02.baremetal]
changed: [node02.baremetal]
changed: [node01.baremetal]
 
TASK: [base | Setup ntp daemon] *********************************************** 
changed: [control02.baremetal]
changed: [node02.baremetal]
changed: [control01.baremetal]
changed: [node01.baremetal]
 
NOTIFIED: [base | Restart ntp daemon] ***************************************** 
changed: [control02.baremetal]
changed: [node01.baremetal]
changed: [node02.baremetal]
changed: [control01.baremetal]
 
PLAY RECAP ********************************************************************

The log shows the package being installed, the configuration set up and finally the service restarted. To verify that everything it ok, we can take a look into one of the nodes’ configuration:

ansible-demo-scripts$  ssh node01.baremetal grep "server\ 192" /etc/ntp.conf
server 192.168.0.150
server 192.168.0.151
server 192.168.0.152

And just for good measure, we can run the playbook again to demonstrate the intelligent daemon restart based on actually changed files:

ansible-demo-scripts$  ansible-playbook -i inventories/hosts.baremetal site.yml --ask-sudo-pass
sudo password: *******
 
PLAY [baremetals] ************************************************************* 
 
GATHERING FACTS *************************************************************** 
…
 
TASK: [base | Ensure ntp daemon autostart] ************************************ 
ok: [control01.baremetal]
ok: [node01.baremetal]
ok: [node02.baremetal]
ok: [control02.baremetal]
 
TASK: [base | Setup ntp daemon config] **************************************** 
ok: [control02.baremetal]
ok: [node02.baremetal]
ok: [control01.baremetal]
ok: [node01.baremetal]

As there were no modifications to the config files, the handler was not notified, leaving the service running.

Conclusion

The above examples showed but a small part of what Ansible can do, even with just a very limited effort.

Of course, as with any tool, you need to get familiar with its capabilities, weaknesses and quirks to leverage it to its full potential. However, we have found that in a short time we could achieve very useful results without any prior knowledge. The documentation is excellent, and there is a huge variety of built-in tasks that can be called upon. Among these are database setup, different package management variants, basic and complex file handling, arbitrary command execution, conditional execution based on the outcome of previous tasks and much more. Just browse through the Module documentation to get inspired.

Combined with the powerful Jinja2 template syntax there is little than cannot be done, even though you need to be careful not to get carried away.

In an upcoming blogpost, we will present a way to get even more flexible by using dynamically generated inventories instead of static files to feed back into the playbook execution.

Daniel Schneller has been designing and implementing complex software and database systems for more than 15 years and is the author of the MySQL Admin Cookbook. His current job title is Principal Cloud Engineer at CenterDevice GmbH, where he focuses on OpenStack and Ceph based cloud technologies. He has given talks at FroSCon, Data2Day and DWX Developer Week among others.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Kommentare

  • 29. June 2014 von pieterjan

    Could you do a separate blog post/tutorial on jinja.
    I would be deeply interested in how you could edit certain values of key in a config file.

    • 1. July 2014 von Daniel Schneller

      I put it onto my todo list. There are a few places that would probably make interesting examples in our config files.

      • 1. July 2014 von pieterjan

        Well i’m looking forward to it. I must say that this blog post is good so damn good. It gives examples of pretty much the basics of ansible and what gives a good view of what is possible with it. How about starting a mini series on youtubu with some hands on content?

  • 1. July 2014 von pieterjan

    So if i get it right you need to to create a copy of the file you want to template, define the variables in it. Change the extension to .j2 and bundle it with your playbook?

    • 1. July 2014 von Daniel Schneller

      That is the basic idea. Our usual procedure is to do exactly that. On a dedicated machine we install the package, take the config file, “templatize” the relevant values and then use it to deploy the actual cluster nodes.

Comment

Your email address will not be published. Required fields are marked *