Overview

Taming the Hybrid Swarm: Initializing a Mixed OS Docker Swarm Cluster Running Windows & Linux Native Containers with Vagrant & Ansible

No Comments

We successfully scaled our Windows Docker containers running on one Docker host. But what if we change our focus and see our distributed application as a whole, running on multiple hosts using both Windows and Linux native containers? In this case, a multi-node Docker Orchestration tool like Docker Swarm could be a marvelous option!

Running Spring Boot Apps on Windows – Blog series

Part 1: Running Spring Boot Apps on Windows with Ansible
Part 2: Running Spring Boot Apps on Docker Windows Containers with Ansible: A Complete Guide incl Packer, Vagrant & Powershell
Part 3: Scaling Spring Boot Apps on Docker Windows Containers with Ansible: A Complete Guide incl Spring Cloud Netflix and Docker Compose
Part 4: Taming the Hybrid Swarm: Initializing a Mixed OS Docker Swarm Cluster running Windows & Linux Native Containers with Vagrant & Ansible

Lifting our gaze to the application as a whole

We really went far in terms of using native Docker Windows containers to run our apps inside. We built our own Windows Vagrant boxes with Packer, prepared them to run Docker smoothly and provisioned our Apps – both fully automated with Ansible. We also scaled our Windows Docker containers using Docker Compose and Spring Cloud Netflix, not leaving our fully comprehensible setup and our willingness to have everything as code behind.

But if you look into real world projects, there are no single nodes anymore – running Docker or not. Applications today consist of a whole bunch of machines – and they naturally mix Linux and Windows. These projects need a solution to handle these distributed applications – ideally not doing everything with completely new tools. But how is this possible?

Why Docker Swarm?

This post is all about the “new” Docker Swarm mode, requiring Docker 1.12.0 as a minimum. But why did I choose this path? Today everything seems to point to Kubernetes: biggest media share, most google searches, most blog posts and so on. But there are a few things to consider before going with Kubernetes.

The first point is simple: a consultant´s real world project experience. After having in-depth discussions about it, you maybe shifted your project team to Dockerize all the (legacy) applications and finally brought all these containers into production. You should always remember: This is huge! And at least according to my experience not every team member already realized at that point what changes were applied to the team’s applications in detail, maybe also leaving some people unsure about “all this new stuff”. And now imagine you want to do the next step with Kubernetes. This means many of those “new” Docker concepts are again thrown over the pile – because Kubernetes brings in a whole bunch of new building blocks, leaving no stone unturned… And every blog post about Kubernetes and every colleague I talk with has to admit at some point that the learning curve with Kubernetes is really steep.

Second point: Many people at conferences propagate the following precedence: They tell you about “Docker 101” with the simplest steps with Docker and then go straight ahead to Kubernetes as the “next logical step”. Well guys, there´s something in between! It should be common sense that learning is ideally done step by step. The next step after Docker 101 is Docker Compose, adding a new level of abstraction. Coming from Compose, it is easy to continue with Docker Swarm – because it is built right into every Docker engine and could be used with Compose as well. It´s just called “Docker Stack” then. 🙂 And if people really do need more features than Swarm provides, then Kubernetes is for sure a good way to go!

Last point: Right now, a hybrid OS Docker cluster doesn´t really make sense with the released versions of Kubernetes and Windows Server 2016. Yes, Windows support was released on Kubernetes 1.6 (alpha with 1.5 already). But if you dive a bit deeper – and that always involves reading through the Microsoft documentation until you reach the part “current restrictions/limitations” – you´ll find the nasty things. As for now, the Windows network subsystem HNS isn´t really Kubernetes-ready, you have to plumb all the networking stuff (like routing tables) manually together. And one container per pod does not really make sense if you want to leverage the power of Kubernetes! Because the Windows SIG ist doing a really great job, these restrictions will not last for long any more and it is planned to have most of them solved by Kubernetes 1.8 and Windows Server 2016 Build 1709.

So if you want to run hybrid OS Docker clusters, just sit back and start with Docker Swarm. I think we´ll see a hybrid OS Kubernetes setup here on the blog in the near future, if Microsoft and the Windows SIG continue their work. 🙂

Building a multi-machine-ready Windows Vagrant box with Packer

Enough talk, let´s get our hands dirty! The last blog posts about Docker Windows containers already showed that only fully comprehensible setups will be used here. The claim is to not leave any stones in your way to get from zero to a running Docker Swarm at the end of this article. Therefore the already well-known GitHub repository ansible-windows-docker-springboot was extended by the next step step4-windows-linux-multimachine-vagrant-docker-swarm-setup.

There are basically two options to achieve a completely comprehensible multi-node setup: running more than one virtual machine on your local machine or using some cloud infrastructure. As I really came to love Vagrant as a tool to handle my virtual machines, why not use it again? And thanks to a colleague of mine´s hint, I found that Vagrant is also able to handle multi-machine setups. This would free us from the need to have access to a certain cloud provider, although the setup would be easily adaptable to one of these.

The only thing that would prevent us from using Vagrant would be the lack of a Windows Server 2016 Vagrant box. But luckily this problem was already solved in the second part of this blog post´s series and we could re-use the setup with Packer.io nearly one to one. There´s only a tiny difference in the Vagrantfile template for Packer: We shouldn´t define a port forwarding or a concrete VirtualBox VM name in this base box. Therefore we need a separate Vagrantfile template vagrantfile-windows_2016-multimachine.template, which is smaller than the one used in the second blog post:

Vagrant.configure("2") do |config|
  config.vm.box = "windows_2016_docker_multi"
  config.vm.guest = :windows
 
  config.windows.halt_timeout = 15
 
  # Configure Vagrant to use WinRM instead of SSH
  config.vm.communicator = "winrm"
 
  # Configure WinRM Connectivity
  config.winrm.username = "vagrant"
  config.winrm.password = "vagrant"
end

To be able to use a different Vagrantfile template in Packer, I had to refactor the Packer configuration windows_server_2016_docker.json slightly to accept a Vagrantfile template name (via template_url) and Vagrant box output name (box_output_prefix) as parameters. Now we´re able to create another kind of Windows Vagrant box, which we could use in our multi-machine setup.

So let´s go to commandline, clone the mentioned GitHub repository ansible-windows-docker-springboot and run the following Packer command inside the step0-packer-windows-vagrantbox directory (just be sure to have a current Packer version installed):

packer build -var iso_url=14393.0.161119-1705.RS1_REFRESH_SERVER_EVAL_X64FRE_EN-US.ISO -var iso_checksum=70721288bbcdfe3239d8f8c0fae55f1f -var template_url=vagrantfile-windows_2016-multimachine.template -var box_output_prefix=windows_2016_docker_multimachine windows_server_2016_docker.json

This could take some time and you´re encouraged to grab a coffee. It´s finished when there´s a new windows_2016_docker_multimachine_virtualbox.box inside the step0-packer-windows-vagrantbox directory. Let´s finally add the new Windows 2016 Vagrant base box to the local Vagrant installation:

vagrant box add --name windows_2016_multimachine windows_2016_docker_multimachine_virtualbox.box

A multi-machine Windows & Linux mixed OS Vagrant setup for Docker Swarm

Now that we have our Windows Vagrant base box in place, we can move on to the next step: the multi-machine Vagrant setup. Just switch over to the step4-windows-linux-multimachine-vagrant-docker-swarm-setup directory and have a look at the Vagrantfile there. Here´s a shortened version where we can see the basic structure with the defintion of our local Cloud infrastructure:

Vagrant.configure("2") do |config|
 
    # One Master / Manager Node with Linux
    config.vm.define "masterlinux" do |masterlinux|
        masterlinux.vm.box = "ubuntu/trusty64"
        ...
    end
 
    # One Worker Node with Linux
    config.vm.define "workerlinux" do |workerlinux|
        workerlinux.vm.box = "ubuntu/trusty64"
        ...
    end
 
    # One Master / Manager Node with Windows Server 2016
    config.vm.define "masterwindows" do |masterwindows|
        masterwindows.vm.box = "windows_2016_multimachine"
        ...
    end
 
    # One Worker Node with Windows Server 2016
    config.vm.define "workerwindows" do |workerwindows|
        workerwindows.vm.box = "windows_2016_multimachine"
        ...
    end
 
end

It defines four machines to show the many possible solutions in a hybrid Docker Swarm cluster containing Windows and Linux boxes: Manager and Worker nodes, also both as Windows and Linux machines.

Packer Vagrant Setup

logo sources: Windows icon, Linux logo, Packer logo, Vagrant logo, VirtualBox logo

Within a Vagrant multi-machine setup, you define your separate machines with the config.vm.define keyword. Inside those define blocks we simply configure our individual machine. Let´s have a more detailed look at the workerlinux box:

    # One Worker Node with Linux
    config.vm.define "workerlinux" do |workerlinux|
        workerlinux.vm.box = "ubuntu/trusty64"
        workerlinux.vm.hostname = "workerlinux01"
        workerlinux.ssh.insert_key = false
        workerlinux.vm.network "forwarded_port", guest: 22, host: 2232, host_ip: "127.0.0.1", id: "ssh"
 
        workerlinux.vm.network "private_network", ip: "172.16.2.11"
 
        workerlinux.vm.provider :virtualbox do |virtualbox|
            virtualbox.name = "WorkerLinuxUbuntu"
            virtualbox.gui = true
            virtualbox.memory = 2048
            virtualbox.cpus = 2
            virtualbox.customize ["modifyvm", :id, "--ioapic", "on"]
            virtualbox.customize ["modifyvm", :id, "--vram", "16"]
        end
    end

The first configuration statements are usual ones like configuring the Vagrant box to use or the VM´s hostname. But the forwarded port configuration is made explicit because we need to rely on the exact port later in our Ansible scripts. This isn´t possible with Vagrant’s default Port Correction feature. Since you won´t be able to use a port on your host machine more than once, Vagrant would automatically set it to a random value – and we wouldn’t be able to access our boxes later with Ansible.

To define and override the SSH port of a preconfigured Vagrant box, we need to know the id which is used to define it in the base box. Using Linux boxes, this is ssh – and with Windows this is winrm-ssl (which I found slightly un-documented…).

Networking between the Vagrant boxes

The next tricky part is the network configuration between the Vagrant boxes. As they need to talk to each other and also to the host, the so-called Host-only networking should be the way to go here (there´s a really good overview in this post, german only). Host-only networking is easily established using Vagrant’s Private Networks configuration.

And as we want to access our boxes with a static IP, we leverage the Vagrant configuration around Vagrant private networking. All that´s needed here is a line like this inside every Vagrant box definition of our multi-machine setup:

masterlinux.vm.network "private_network", ip: "172.16.2.10"

Same for Windows boxes. Vagrant will tell VirtualBox to create a new separate network (mostly vboxnet1 or similar), put a second virtual network device into every box and assign it with the static IP we configured in our Vagrantfile. That´s pretty much everything, except for Windows Server. 🙂 But we´ll take care of that soon.

Ansible access to the Vagrant boxes

Starting with the provisioning of multiple Vagrant boxes, the first approach might be to use Vagrant´s Ansible Provisioner and just have something like the following statement in your Vagrantfile:

  config.vm.provision "ansible" do |ansible|
      ansible.playbook = "playbook.yml"
  end

But remember the purpose of this article: We want to initialize a Docker Swarm later using Ansible. And as this process involves generating and exchanging Join Tokens between the different Vagrant boxes, we need one central Ansible script to share these tokens. If we separated our Ansible scripts into as many as machines as our Cluster has (here: four), we would lose many advantages of Ansible and wouldn´t be able to share the tokens. Additionally, it would be great if we could fire up our entire application with one Ansible command, no matter if it´s distributed over a hybrid Cluster of Windows and Linux machines.

So we want one Ansible playbook that´s able to manage all nodes for us. But there´s a trap: using the same host in multiple groups is possible with Ansible, but all the inventory and group variables will be merged automatically. That is, because Ansible is designed to do that based on the host´s name. So please don´t do the following:

[masterwindows]
127.0.0.1
 
[masterlinux]
127.0.0.1
 
[workerwindows]
127.0.0.1

We somehow need to give Ansible a different hostname for our servers, although they are all local and share the same IP. Because a later-stage-based setup wouldn´t have this problem any more, we only need a solution for our local development environment with Vagrant. And there´s a quite simple one: just edit your etc/hosts on MacOS/Linux or Windows\system32\drivers\etc\hosts on Windows and add the following entries:

127.0.0.1 masterlinux01
127.0.0.1 workerlinux01
127.0.0.1 masterwindows01
127.0.0.1 workerwindows01

This is a small step we have to do by hand, but you can also work around it if you want. There are Vagrant plugins like vagrant-hostmanager that allow you to define these hostfile entries based on the config.vm.hostname configuration in your Vagrantfile. But this will require you to input your admin password every time you run vagrant up, which is also quite manually. Another alternative would have been to use the static IPs we configured in our host-only network. But it is really nice to see those aliases like masterlinux01 or workerwindows01 later beeing provisioned in the Ansible playbooks runs – you always know what machine is currently in action 🙂

Now we´re where we wanted to be: We have a Vagrant multi-machine setup in place that fires up a mixed OS cluster with a simple command. All we have to do is to run a well-known vagrant up:

vagrant up showing four machines

Just be sure to have at least 8 GB of RAM to spare because every box has 2048 MB configured. You could also tweak that configuration in the Vagrantfile – but don´t go too low 🙂 And never mind, if you want to have a break or your notebook is running hot – just type vagrant halt. And the whole zoo of machines will be stopped for you.

Provisioning Windows & Linux machines inside one Ansible playbook

Now let´s hand over the baton to Ansible. But as you may have already guessed: The tricky part is to configure Ansible in a way that enables it to provision both Windows and Linux machines inside one playbook. As we already found out, Ansible is not only able to provision Linux machines, but also doesn´t shrink back from Windows boxes.

But handling Windows and Linux inside the same playbook requires a configuration option to be able to access both Linux machines via SSH and Windows machines via WinRM. The key configuration parameter to success here really is ansible_connection. Handling both operating systems with Ansible at the same time isn´t really well documented – but it´s possible. Let´s have a look at how this blog post´s setup handles this challenge. Therefore we begin with the hostsfile:

[masterwindows]
masterwindows01
 
[masterlinux]
masterlinux01
 
[workerwindows]
workerwindows01
 
[workerlinux]
workerlinux01
 
[linux:children]
masterlinux
workerlinux
 
[windows:children]
masterwindows
workerwindows

The first four definitions simply order our Vagrant box machine names (which we defined inside our etc/hosts file) according to the four possible categories in a Windows/Linux mixed OS environment. As already said, these are Manager/Master nodes (masterwindows and masterlinux) and Worker nodes (workerwindows and workerlinux) with both Windows and Linux. The last two entries bring Ansible´s “Group of Groups” feature into the game. As all the machines of the group’s masterlinux and workerlinux are based on Linux, we configure them with the help of the suffix :children to belong to the supergroup linux. The same procedure applies to windows group of groups.

This gives us the following group variables structure:

Ansible group_vars structure to handle Windows and Linux in parallel

The all.yml inherits configuration that should be applied to all machines in our Cluster, regardless if they are Windows or Linux boxes. And as the user and password are always the same with Vagrant boxes, we configure them there:

ansible_user: vagrant
ansible_password: vagrant

In the windows.yml and linux.yml we finally use the mentioned ansible_connection configuration option to distinguish between both connection types. The linux.yml is simple:

ansible_connection: ssh

Besides the needed protocol definition through ansible_connection, the windows.yml adds a second configuration option for the WinRM connection to handle self-signed certificates:

ansible_connection: winrm
ansible_winrm_server_cert_validation: ignore

The last thing to configure so that Ansible is able to access our Vagrant boxes is the correct port configuration. Let´s have a look into workerwindows.yml:

ansible_port: 55996

We need this configuration for every machine in the cluster. To be 100 % sure what port Vagrant uses to forward for SSH or WinRM on the specific machine, we need to configure it inside the Vagrantfile. As already mentioned in the paragraph A Multi-machine Windows- & Linux- mixed OS Vagrant setup for Docker Swarm above, this is done through a forwarded_port configuration (always remember to use the correct configuration options id: "ssh" (Linux) or id: "winrm-ssl" (Windows)):

workerwindows.vm.network "forwarded_port", guest: 5986, host: 55996, host_ip: "127.0.0.1", id: "winrm-ssl"

With this configuration, we´re finally able to access both Windows and Linux boxes within one Ansible playbook. Let´s try this! Just be sure to have fired up all the machines in the cluster via vagrant up. To try the Ansible connectivity e.g. to the Windows Worker node, run the following:

ansible workerwindows -i hostsfile -m win_ping

Testing the Ansible connectivity to a Linux node, e.g. the Linux Manager node, is nearly as easy:

ansible masterlinux -i hostsfile -m ping

Only on the first run, you need to wrap the command with setting and unsetting an environment variable that enables Ansible to successfully add the new Linux host to its known hosts. So in the first run instead of just firing up one command, execute these three (as recommended here):

export ANSIBLE_HOST_KEY_CHECKING=False
ansible masterlinux -i hostsfile -m ping
unset ANSIBLE_HOST_KEY_CHECKING

If you don´t want to hassle with generating keys, you maybe want to install sshpass (e.g. via brew install https://raw.githubusercontent.com/kadwanev/bigboybrew/master/Library/Formula/sshpass.rb on a Mac, as there´s no brew install sshpass). In this case, you should also set and unset the environment variable as described.

And voilà: We now have Ansible configured in a way that we can control and provision our cluster with only one playbook.

swarm-ansible-connectivity

logo sources: Windows icon, Linux logo, Packer logo, Vagrant logo, VirtualBox logo, Ansible logo

Prepare Docker engines on all nodes

Ansible is now able to connect to every box of our multi-machine Vagrant setup. There are roughly two steps left: First we need to install and configure Docker on all nodes, so that we can initialize our Docker Swarm in a second step.

Therefore, the example project´s GitHub repository has two main Ansible playbooks prepare-docker-nodes.yml and initialize-docker-swarm.yml. The first one does all the groundwork needed to be able to initialize a Docker Swarm successfully afterwards, which is done in the second one. So let´s have a more detailed look at what´s going on inside these two scripts!

As Ansible empowers us to abstract from the gory details, we should be able to understand what´s going on inside the prepare-docker-nodes.yml:

- hosts: all
 
  tasks:
  - name: Checking Ansible connectivity to Windows nodes
    win_ping:
    when: inventory_hostname in groups['windows']
 
  - name: Checking Ansible connectivity to Linux nodes
    ping:
    when: inventory_hostname in groups['linux']
 
  - name: Allow Ping requests on Windows nodes (which is by default disabled in Windows Server 2016)
    win_shell: "netsh advfirewall firewall add rule name='ICMP Allow incoming V4 echo request' protocol=icmpv4:8,any dir=in action=allow"
    when: inventory_hostname in groups['windows']
 
- name: Prepare Docker on Windows nodes
  include: "../step1-prepare-docker-windows/prepare-docker-windows.yml host=windows"
 
- name: Prepare Docker on Linux nodes
  include: prepare-docker-linux.yml host=linux
 
- name: Allow local http Docker registry
  include: allow-http-docker-registry.yml

This blog post always tries to outline a fully comprehensible setup. So if you want to give it a try, just run the playbook inside the step4-windows-linux-multimachine-vagrant-docker-swarm-setup directory:

ansible-playbook -i hostsfile prepare-docker-nodes.yml

During the execution of the complete playbook, we should continue to dive into the playbooks structure. The first line is already quite interesting. With the hosts: all configuration we tell Ansible to simultaneously use all configured hosts at the same time. This means the script will be executed with masterlinux01, masterwindows01, workerlinux01 and workerwindows01 in parallel. The following two tasks represent a best practice with Ansible: Always check the connectivity to all our machines at the beginning – and stop the provisioning if a machine isn´t reachable.

Connectivity Test

As the Ansible modules for Linux and Windows are separated by design and non-compatible to each other, we need to always know on what kind of servers we want to execute our scripts. We could use Ansible conditionals with the when statement in that case. The conditional

when: inventory_hostname in groups['linux']

ensures that the present Ansible module is only executed on machines that are listed in the group linux. And as we defined the subgroups masterlinux and workerlinux below from linux, only the hosts masterlinux01 and workerlinux01 should be used here. masterwindows01 and workerwindows01 are skipped. Obviously the opposite is true when we use the following conditional:

when: inventory_hostname in groups['windows']

The next task is an Windows Server 2016 exclusive one. Because we want our Vagrant boxes to be accessible from each other, we have to allow the very basic command everybody starts with: the ping. That one is blocked by the Windows firewall as a default and we have to allow this with the following Powershell command:

  - name: Allow Ping requests on Windows nodes (which is by default disabled in Windows Server 2016)
    win_shell: "netsh advfirewall firewall add rule name='ICMP Allow incoming V4 echo request' protocol=icmpv4:8,any dir=in action=allow"
    when: inventory_hostname in groups['windows']

The following tasks finally installs Docker on all of our nodes. Luckily we can rely on work that was already done here. The post Running Spring Boot Apps on Docker Windows Containers with Ansible: A Complete Guide incl Packer, Vagrant & Powershell elaborates on how to prepare Docker on Windows in depth. The only thing we have to do here is to re-use the Ansible script with host=windows appended:

- name: Prepare Docker on Windows nodes
  include: "../step1-prepare-docker-windows/prepare-docker-windows.yml host=windows"

The Linux counterpart is a straightforward Ansible implementation of the official “Get Docker CE for Ubuntu” Guide. The called prepare-docker-linux.yml is included from the main playbook with the host=linux setting:

- name: Prepare Docker on Linux nodes
  include: prepare-docker-linux.yml host=linux

If you want to use a different Linux distribution, just add the appropriate statements inside prepare-docker-linux.yml or search for an appropriate role to use on Ansible Galaxy.

Allowing http-based local Docker Registries

The last task in the prepare-docker-nodes.yml playbook seems to be rather surprising. The reason is simple: We can´t follow our old approach when building our Docker images on single Docker host any more, because in this way, we would be forced to build each image on all of our cluster´s nodes again and again, which leads to a heavy overhead. A different approach is needed here. With the help of a local Docker registry, we would only need to build an image once and push it to the registry. Then the image would be ready to run on all of our nodes.

How to run a Docker registry will be covered in a later step, but we have to take care of some groundwork here already. The simplest possible solution is to start with a plain http registry, which shouldn´t be a big security risk inside our isolated environment and also in many on-premises installations. Just be sure to update to https with TLS certificates if you´re going into the Cloud or if you want to provide your registry to other users outside the Docker Swarm.

Every Docker engine has to be configured to allow for interaction with a plain http registry. Therefore we have to add a daemon.json file into the appropriate folders which contains the following entry:

{
  "insecure-registries" : ["172.16.2.10:5000"]
}

As we want to run our Docker Swarm local registry on the Linux Manager node, we configure its IP address 172.16.2.10 here. Remember this address was itself configured inside the Vagrantfile.

But since we´re using Ansible, this step is also fully automated inside the included playbook allow-http-docker-registry.yml – including the correct daemon.json paths:

  - name: Template daemon.json to /etc/docker/daemon.json on Linux nodes for later Registry access
    template:
      src: "templates/daemon.j2"
      dest: "/etc/docker/daemon.json"
    become: yes
    when: inventory_hostname in groups['linux']
 
  - name: Template daemon.json to C:\ProgramData\docker\config\daemon.json on Windows nodes for later Registry access
    win_template:
      src: "templates/daemon.j2"
      dest: "C:\\ProgramData\\docker\\config\\daemon.json"
    when: inventory_hostname in groups['windows']

After that last step we now have every node ready with a running Docker engine and are finally able to initialize our Swarm.

Swarm Docker Prepared

logo sources: Windows icon, Linux logo, Vagrant logo, VirtualBox logo, Ansible logo, Docker logo

Initializing a Docker Swarm

Wow, this was quite a journey until we finally got where we wanted to be in the first place. Since Docker is prepared on all nodes, we could continue with the mentioned second part of the example project´s GitHub repository. The playbook initialize-docker-swarm.yml inherits everything that´s needed to initialize a fully functional Docker Swarm. So let´s have look at how this is done:

- hosts: all
  vars:
    masterwindows_ip: 172.16.2.12
 
  tasks:
  - name: Checking Ansible connectivity to Windows nodes
    win_ping:
    when: inventory_hostname in groups['windows']
 
  - name: Checking Ansible connectivity to Linux nodes
    ping:
    when: inventory_hostname in groups['linux']
 
  - name: Open Ports in firewalls needed for Docker Swarm
    include: prepare-firewalls-for-swarm.yml
 
  - name: Initialize Swarm and join all Swarm nodes
    include: initialize-swarm-and-join-all-nodes.yml
 
  - name: Label underlying operation system to each node
    include: label-os-specific-nodes.yml
 
  - name: Run Portainer as Docker and Docker Swarm Visualizer
    include: run-portainer.yml
 
  - name: Run Docker Swarm local Registry
    include: run-swarm-registry.yml
 
  - name: Display the current Docker Swarm status
    include: display-swarm-status.yml

Before we go into any more details, let´s run this playbook also:

ansible-playbook -i hostsfile initialize-docker-swarm.yml

We´ll return to your fully initialized and running Docker Swarm cluster after we have had a look into the details of this playbook. 🙂 The first two tasks are already familiar to us. Remember that connectivity checks should always be the first thing to do. After these checks, the prepare-firewalls-for-swarm.yml playbook opens up essential ports for the later running Swarm. This part is mentioned pretty much at the end of the Docker docs if you read them through. There are basically three firewall configurations needed. TCP port 2377 is needed to allow the connection of all Docker Swarm nodes to the Windows Manager node, where we will initialize our Swarm later on. Therefore we use the conditional when: inventory_hostname in groups['masterwindows'], which means that this port is only opened up on the Windows Manager node. The following two configurations are mentioned in the docs:

“[…] you need to have the following ports open between the swarm nodes before you enable swarm mode.”

So we need to do this before even when initializing our Swarm! These are TCP/UDP port 7946 for Docker Swarm Container network discovery and UDP port 4789 for Docker Swarm overlay network traffic.

Join the Force… erm, Swarm!

The following task of our main initialize-docker-swarm.yml includes the initialize-swarm-and-join-all-nodes.yml playbook and does the heavy work needed to initialize a Docker Swarm with Ansible. Let´s go through all the steps here in detail:

- name: Leave Swarm on Windows master node, if there was a cluster before
  win_shell: "docker swarm leave --force"
  ignore_errors: yes
  when: inventory_hostname == "masterwindows01"
 
- name: Initialize Docker Swarm cluster on Windows master node
  win_shell: "docker swarm init --advertise-addr={{masterwindows_ip}} --listen-addr {{masterwindows_ip}}:2377"
  ignore_errors: yes
  when: inventory_hostname == "masterwindows01"
 
- name: Pause a few seconds after new Swarm cluster initialization to prevent later errors on obtaining tokens to early
  pause:
    seconds: 5
...

If you´re a frequent reader of this blog posts´ series, you´re already aware that there are many steps inside Ansible playbooks that are irrelevant for the first execution. And leaving the Swarm in the first step is such a case. If you run the playbook the next time, you will know what that is all about. It´s not a problem that this step will fail at the first exectution. The ignore_errors: yes configuration takes care of that.

The magic follows inside the next step. It runs the needed command to initialize a leading Docker Swarm Manager node, which we chose our Windows Manager node for. Both advertise-addr and listen-addr have to be set to the Windows Manager node in this case. As the initialization process of a Swarm takes some time, this step is followed by a pause module. We just give our Swarm some seconds in order to get itself together.

The reason for this are the following two steps, which obtain the later needed Join Tokens (and these steps occasionally fail, if you run them right after the docker swarm init step). The commands to get these tokens are docker swarm join-token worker -q for Worker nodes oder docker swarm join-token manager -q for Manager nodes.

...
- name: Obtain worker join-token from Windows master node
  win_shell: "docker swarm join-token worker -q"
  register: worker_token_result
  ignore_errors: yes
  when: inventory_hostname == "masterwindows01"
 
- name: Obtain manager join-token from Windows master node
  win_shell: "docker swarm join-token manager -q"
  register: manager_token_result
  ignore_errors: yes
  when: inventory_hostname == "masterwindows01"
 
- name: Syncing the worker and manager join-token results to the other hosts
  set_fact:
    worker_token_result_host_sync: "{{ hostvars['masterwindows01']['worker_token_result'] }}"
    manager_token_result_host_sync: "{{ hostvars['masterwindows01']['manager_token_result'] }}"
 
- name: Extracting and saving worker and manager join-tokens in variables for joining other nodes later
  set_fact:
    worker_jointoken: "{{worker_token_result_host_sync.stdout.splitlines()[0]}}"
    manager_jointoken: "{{manager_token_result_host_sync.stdout.splitlines()[0]}}"
 
- name: Join-tokens...
  debug:
    msg:
      - "The worker join-token is: '{{worker_jointoken}}'"
      - "The manager join-token is: '{{manager_jointoken}}'"
...

As both steps run scoped via the conditional when: inventory_hostname == "masterwindows01" only on the host masterwindows01, they are not easy to hand over to the other hosts. But as we need them there, so that they are able to join the Swarm, we need to “synchronize” them with the help of the set_fact Ansible module and the definition of variables that are assigned the Join Tokens. To access the tokens from masterwindows01, we grab them with the following trick:

worker_token_result_host_sync: "{{ hostvars['masterwindows01']['worker_token_result'] }}"

The hostvars['masterwindows01'] statement gives us access to the masterwindows01 variables. The trailing ['worker_token_result'] points us to the registered result of the docker swarm join-token commands. And inside the following set_fact module, the only needed value is extracted with worker_token_result_host_sync.stdout.splitlines()[0]. Now looking onto the console output, the debug module prints all the extracted tokens for us.

Synchronizing Swarm join tokens

Now we´re able to join all the other nodes to our Swarm – which again is prefixed with the leaving of a Swarm, not relevant to the first execution of the playbook. To join a Worker to the Swarm, the docker swarm join --token {{worker_jointoken}} {{masterwindows_ip}} command has to be executed. To join a new Manager, a very similar docker swarm join --token {{manager_jointoken}} {{masterwindows_ip}} is needed.

...
- name: Leave Swarm on Windows worker nodes, if there was a cluster before
  win_shell: "docker swarm leave"
  ignore_errors: yes
  when: inventory_hostname in groups['workerwindows']
 
- name: Add Windows worker nodes to Docker Swarm cluster
  win_shell: "docker swarm join --token {{worker_jointoken}} {{masterwindows_ip}}"
  ignore_errors: yes
  when: inventory_hostname in groups['workerwindows']
 
- name: Leave Swarm on Linux worker nodes, if there was a cluster before
  shell: "docker swarm leave"
  ignore_errors: yes
  when: inventory_hostname in groups['workerlinux']
 
- name: Add Linux worker nodes to Docker Swarm cluster
  shell: "docker swarm join --token {{worker_jointoken}} {{masterwindows_ip}}"
  ignore_errors: yes
  when: inventory_hostname in groups['workerlinux']
 
- name: Leave Swarm on Linux manager nodes, if there was a cluster before
  shell: "docker swarm leave --force"
  ignore_errors: yes
  when: inventory_hostname in groups['masterlinux']
 
- name: Add Linux manager nodes to Docker Swarm cluster
  shell: "docker swarm join --token {{manager_jointoken}} {{masterwindows_ip}}"
  ignore_errors: yes
  when: inventory_hostname in groups['masterlinux']
...

At this point we have managed to initialize a fully functional Docker Swarm already!

Swarm initialized

logo sources: Windows icon, Linux logo, Vagrant logo, VirtualBox logo, Ansible logo, Docker & Docker Swarm logo

Congratulations! 🙂 But why do we need a few more steps?

Visualize the Swarm with Portainer

It´s always good to know what´s going on inside our Swarm! We are already able to obtain all the information with the help of Docker Swarm’s CLI, e.g. through docker service ls or docker service ps [yourServiceNameHere]. But it won’t hurt to also have a visual equivalent in place.

Docker´s own Swarm visualizer doesn´t look that neat compared to another tool called Portainer. There´s a good comparison available on stackshare if you´re interested. To me, Portainer seems to be the right choice when it comes to Docker and Docker Swarm visualization. And as soon as I read the following quote, I needed to get my hands on it:

“[Portainer] can be deployed as Linux container or a Windows native container.”

The Portainer configuration is already included in this setup here. The run-portainer.yml does all that´s needed:

- name: Create directory for later volume mount into Portainer service on Linux Manager node if it doesn´t exist
  file:
    path: /mnt/portainer
    state: directory
    mode: 0755
  when: inventory_hostname in groups['linux']
  sudo: yes
 
- name: Run Portainer Docker and Docker Swarm Visualizer on Linux Manager node as Swarm service
  shell: "docker service create --name portainer --publish 9000:9000 --constraint 'node.role == manager' --constraint 'node.labels.os==linux' --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock --mount type=bind,src=/mnt/portainer,dst=/data portainer/portainer:latest -H unix:///var/run/docker.sock"
  ignore_errors: yes
  when: inventory_hostname == "masterlinux01"

This will deploy a Portainer instance onto our Linux Manager node and connect it directly to the Swarm. For more details, see the Portainer docs. But there´s one thing that could lead to frustration: Use a current browser to access Portainer UI inside your Windows boxes! It doesn´t work inside the pre-installed Internet Explorer! Just head to http://172.16.2.10:9000 if you want to access Portainer from within the cluster.

But as we have the port forwarding configuration masterlinux.vm.network "forwarded_port", guest: 9000, host: 49000, host_ip: "127.0.0.1", id: "portainer" inside our Vagrantfile, we can also access the Portainer UI from our Vagrant host by simply pointing our browser to http://localhost:49000/:

Portainer Docker Swarm visualizer Dashboard

Run a local Registry as Docker Swarm service

As already stated in the paragraph “Allowing http-based local Docker Registries”, we configured every Docker engine on every Swarm node to access http-based Docker registries. Although a local registry is only relevant for later application deployments, it´s something like a basic step when in comes to initializing a Docker Swarm Cluster. So let´s start our Docker Swarm Registry Service as mentioned in the docs. There were some errors in those docs that should be fixed by now (docker.github.io/pull/4465, docker.github.io/pull/4641 & docker.github.io/pull/4644). Everything needed is done inside the run-swarm-registry.yml:

- name: Specify to run Docker Registry on Linux Manager node
  shell: "docker node update --label-add registry=true masterlinux01"
  ignore_errors: yes
  when: inventory_hostname == "masterlinux01"
 
- name: Create directory for later volume mount into the Docker Registry service on Linux Manager node if it doesn´t exist
  file:
    path: /mnt/registry
    state: directory
    mode: 0755
  when: inventory_hostname in groups['linux']
  sudo: yes
 
- name: Run Docker Registry on Linux Manager node as Swarm service
  shell: "docker service create --name swarm-registry --constraint 'node.labels.registry==true' --mount type=bind,src=/mnt/registry,dst=/var/lib/registry -e REGISTRY_HTTP_ADDR=0.0.0.0:5000 -p 5000:5000 --replicas 1 registry:2"
  ignore_errors: yes
  when: inventory_hostname == "masterlinux01"

As we want to run our registry on our Linux Manager node, we need to set a label to it at first. This is done with the docker node update --label-add command. Then we create a mountpoint inside the Linux Manager Docker host for later usage in the registry Docker container. The last step is the crucial one. It creates a Docker Swarm service running our local Registry, configured to run on port 5000 and only on the Linux Manager node with the help of --constraint 'node.labels.registry==true'.

If you manually check the Swarm´s services after this command, you´ll notice a running Swarm service called swarm-registry. Or we could simply go to our Portainer UI on http://localhost:49000/ and have a look:

Portainer Swarm services overview

The Swarm is ready for action!

We´ve reached the last step in our playbook initialize-docker-swarm.yml. The last task includes the playbook display-swarm-status.yml, which doesn´t really do anything on our machines – but outputs the current Swarm status to the console that executes our playbook:

Swarm final status

This means that our Docker Swarm cluster is ready for the deployment of our applications! Wow, this was quite a journey we did in this post. But I think we´ve already reached a lot of our goals. Again, we have a completely comprehensible setup in place. Messing up something on the way to our running Swarm is not a problem any more. Just delete everything an start fresh! As we use the “Infrastructure as code” paradigm here, everything is automated and 100 % transparent. Just have a look into the GitHub repository or the commandline output. So no “I have played around with Swarm and everything worked out on my machine” speech. This setup works. And if not, fix it with a pull request. 🙂 I can´t emphasize this enough in the context of the fast-moving development of Docker Windows Containers right now.

This brings us to the second goal: We have a fully functional mixed OS hybrid Docker Swarm cluster in place which provides every potentially needed basis for our applications inside Docker containers – be it native Windows or native Linux. And by leveraging the power of Vagrant´s multi-machine setups, everything can be executed locally on your laptop while at the same time opening up to any possible cloud solution out there. So this setup will provide us with a local environment which is as near to staging or even production as possible.

So what´s left? We haven´t deployed an application so far! We for sure want to deploy a lot of microservices to our Docker Swarm cluster and let them automatically be scaled out. We also need to know how we can access applications running inside the Swarm and how we can do things like rolling updates without generating any downtime to our consumers. There are many things left to talk about, but maybe there will also be a second part to this blog post.

Jonas Hecht

Trying to bridge the gap between software architecture and hands on coding, Jonas hired at codecentric. He has deep knowledge in all kinds of enterprise software development, paired with passion for new technology. Connecting systems via integration frameworks Jonas learned to not only get the hang of technical challenges.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Comment

Your email address will not be published. Required fields are marked *