Overview

Microservices with Nomad and Consul

No Comments

Companies want to deliver software faster and be able to deploy different parts of their system autonomously. Therefore they are splitting their existing monoliths into smaller services. Splitting a monolith is an ambitious project and requires identifying the correct service boundaries and cross-cutting concerns. As the number of microservices increases, operational problems that didn’t exist before come to light. Some of these problems are:

  • How to dynamically distribute my running services while using available hardware resources as reasonably as possible?
  • How to get resource isolation (CPU, memory, disk I/O, network, etc.) when multiple services are running on the same host?
  • How to resolve service discovery and failure detection?
  • How to share configuration between multiple services?

To get a better overview of the first two problems, we have displayed them in the picture below, assuming that we want to deploy three new instances of the microservice ‘D’. The new microservice ‘D’ has resource requirements of 1 GB RAM and needs at least 500 MHz CPU (see picture below). Currently, we have 4 available nodes with a different number of microservices already running on them, so it is necessary to deploy the 3 new instances taking into account the currently available CPU and RAM on each node. Also, we need to ensure that we are able to limit the resources that each microservice uses. Because of possible misbehavior it could happen that one microservice consumes all of the memory or CPU of the node. So we have an additional requirement that we need resource isolation on the service level.

Example of resource deployment manager

The possible solution for these two problems is to use an orchestration and resource manager system, which will help us get some order into the chaos. The ‘usual suspects’ for these jobs are Kubernetes, Docker Swarm or a combination of Mesos, Marathon, and DC/OS. If our decision is not to use Docker for the deployment and packaging of the microservices or we realize that most of our hosts are running on a Kernel version that is not suitable to run Docker at all, then we must be able to run binaries or fat jars that are not packaged as a Docker container. In this case, the selection is reduced to Mesos and Marathon. Setting up and maintaining any of these solutions in a private datacenter is a challenging task for every organization. Besides these common products we have an additional solution on the market: HashiCorp’s cluster manager Nomad. 

What is Nomad?

Nomad’s primary task is to manage a cluster of machines and run different types of applications on them. Nomad integrates very well with the HashiCorp service discovery and configuration tool Consul, providing a complementary feature set. Let’s focus on the key features which actually distinguish Nomad from most of the other platforms:

  1. Flexible workload: Nomad’s most interesting feature is the ability to run different kinds of applications: Docker Containers, ‘normal Linux processes’, Java applications, Apache Spark jobs, Rocket containers and even images (e.g. qcow, img, iso) with the quemu driver
  2. Simplicity: Like all Hashicorp products, Nomad consists of a single binary for the clients and servers.
  3. Distributed and highly available even across multiple datacenters: For leader election and state replication, Nomad uses the Serf protocol, a HashiCorp lightweight gossip protocol to communicate with the nodes. Multiple datacenters can be managed as part of a larger region, and jobs can be scheduled across datacenters.

There are additional features that Nomad provides and we could continue to list them, but the greatest advantage are the flexible workloads, which means that we are not limited to use only Docker containers like in some other platforms. Let’s dig a little bit deeper and see how this is realized and implemented in Nomad.

It’s all about Jobs, Allocations, and Drivers

Nomad is deployed as a single binary and dependent on the configuration. Nomad is started in the client or in the server mode.  Servers are responsible for managing the cluster and task scheduling.

Nomad Client/Server

The Client is a very lightweight process that registers to the servers. The client’s primary job is to execute tasks assigned to them by the server. The regular Nomad cluster setup is to have at least 3 to 5 servers which may manage up to thousands of clients. The tasks are defined in a so-called job specification file. This file is written in a vendor-specific HCL (HashiCorp configuration language) format, in a declarative way. This job specification contains all the necessary information for running a Nomad job. The job specification file includes the executing tasks, specifies the necessary resources, and limits the job execution within the defined constraints.

This is an example of a job specification file for a python app:

job "python-app" {
 
  # Run this job as a "service" type. Each job type has different properties
  type = "service"
 
  # A group defines a series of tasks that should be co-located on the same client (host)
  group "server" {
    count = 1
 
    # Create an individual task (unit of work)
    task "python-app" {
      driver = "exec"
 
      # Specifies what should be executed when starting the job
      config {
        command = "/bin/sh"
        args = [
          "/local/install_run.sh"]
      }
 
      # Defines the source of the artifact which should be downloaded
      artifact {
        source = "https://github.com/tomiloza/nomad-consul-demo/raw/master/apps/python/app.tgz"
      }
 
      # The service block tells Nomad how to register this service with Consul for service discovery and monitoring.
      service {
        name = "python-app"
        port = "http"
 
        check {
          type = "http"
          path = "/"
          interval = "10s"
          timeout = "2s"
        }
      }
 
      # Specify the maximum resources required to run the job, include CPU, memory, and bandwidth
      resources {
        cpu = 500
        memory = 256
 
        network {
          mbits = 5
 
          port "http" {
            static = 9080
          }
        }
      }
    }
  }
}

When we submit a job specification to the Nomad server, we initiate a desired state. The job specification defines that we want to run a task. A task creates an individual unit of work, such as a Docker container, web application, or batch processing. This unit of work is deployed a certain number of times and under specific constraints, as we have mentioned in the first example above where we want to deploy 3 new instances of service ‘D’ with the requirements of 1 GB RAM and at least 500 MHz CPU. The jobs define the desired state and the Nomad server analyzes the actual state and triggers the evaluation process. In the picture below the evaluation and scheduling process is displayed.

Nomad scheduling process

If the current state changes, either by a desired (deploy of a new job) or emergent event (node failure), the evaluation process is triggered. This means that Nomad must evaluate the current state and compare it with the desired state defined in the jobs specification. This evaluation is queued into the evaluation broker and the evaluation broker manages the pending evaluations. The responsibility for processing the evaluations is on the schedulers. There are three basic types of schedulers batch, service and system. The service scheduler is responsible for processing the service jobs which usually should be long-living services, the batch scheduler, as expected, is responsible for processing batch jobs, and the system scheduler processes jobs that run on every node and they are called system jobs. The outcome of the scheduling process is an action plan. The plan defines the exact actions that are executed to accomplish the desired state. Such a plan defines that an allocation should be created, updated or deleted. The allocation is an isolated environment that is created for the particular job on the Nomad client node. Let’s see how an allocation is constructed and what the building blocks for an allocation are.

Allocations (chroot + cgroups)

We will explain the allocations concept based on a real example. For this purpose, we will set up a Vagrant Ubuntu box with Nomad and Consul. The Ansible configuration can be found on my Github account: github-tomiloza.

After we provision the VM with Vagrant, we will have Nomad and Consul up and running. The Nomad/Consul agents are started in the bootstrap mode, which is basically  a hybrid client-server mode, and should only be used for testing purposes. Additionally we will install Hashi-ui , a user interface for Nomad. When the virtual machine is provisioned, we will also have 4 different Nomad jobs running (the Nomad job definitions are located under the jobs directory). The Nomad UI is available on port 3000 on the IP address which we defined in the Vagrant file under the config.vm.network parameter. Navigating to Nomad-UI (link when the VM is started), we can see that we currently have 4 different Nomad jobs running.

Nomad UI

We could also acquire the same information on the command line in the VM by typing nomad status. In the VM we have triggered 3 different types of jobs: a Docker job,  a Java job and a so-called “isolated fork/exec” job, which is basically an isolated Linux process. Now that we have Nomad with the jobs up and running, we can analyze how these jobs are actually triggered in the VM. As already mentioned, for every job Nomad creates a corresponding allocation. The directory where Nomad creates these allocations is defined in the client configuration in the data_dir property. In our provisioned VM these allocations are stored under /data/nomad/data/alloc.

To get a better overview how a Nomad job is created and where the job stores the necessary files, we will analyze the running Java job. First, we need to get the pid of the running Java process

ps -axf | grep java | grep -v grep

When we have found out the pid of the Java process, we can cd to the /proc/<pid> directory. By listing the /proc/<pid>/ directory structure, we can see that the root of the process is pointing to the location defined by the data_dir property.

Nomad Chroot Process

This means that the Java process is chrooted. The Java process (like the Python process) is isolated from the rest of the system. The basic idea behind this concept is to copy the system files that are necessary for running the process to a specific location and then use chroot to change the root directory of the process. The referenced directory becomes the new root of the process. For the process, however, the root directory is seemingly still at /. A program that is run in such a modified environment cannot access files outside the designated directory tree.

Chroot directory tree

Also, we have the option to narrow down the binaries that will be copied to the defined location with a specific nomad chroot_env configuration parameter. In the nomad job definition, we can add resource restrictions for every particular job. Under the resource section it is possible to define memory, iops, cpu and network requirements for the job. To set up these resource requirements for the job, Nomad uses the Linux cgroups feature. The Linux Control Groups (cgroups) allow us to limit the resources for a certain process. When we put the chroot process under a cgroup, we limit the resource that the process and its children have access to. For the nomad java job, we have defined a memory limit of 256 MB RAM.
We can find out which cgroup is applied for the java process when we take a look at the /proc/<proc_id>/cgroup file. With the retrieved cgroup id we can display the value of the memory.limit_in_bytes file for this java process.

cgroups memory

As expected, the cgroup memory limit in bytes corresponds to the value defined in the jobs specification in megabytes. With these two basic Linux features, cgroups and chroot, Nomad creates isolated environments (allocations), which allows us to separate memory and CPU for a process and jail a process to its own set of binaries. This gives us the ability to have resource isolation for plain java or python apps.

Consul integration

Navigating to the Consul-UI (link when the VM is started), we can see that all jobs are registered in Consul as services. In the job specification under the service block we define the necessary values for the Consul registration. With the services registered in Consul we get features from Consul like: service registration, failure detection and configuration sharing.

Conclusion

Nomad and Consul together are a powerful combination for resolving lots of the operational problems that come with the microservices architecture. One of the key advantages is the ability to run microservices as plain Java, Python or Go apps and still have the necessary service and resource isolation. Just to be clear, there is also the possibility to run full dockerized apps, with the Nomad Docker driver.

Tomislav Lozancic

Tomislav is IT Consultant in codecentric’s Munich office. On the JVM he feels at home and his areas of expertise are CI / CD, Infrastructure as Code and DevOps. Currently, Tomislav is particularly interested in cloud computing and cluster platforms, such as Nomad and Kubernetes.

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Comment

Your email address will not be published. Required fields are marked *