SMACK Stack DC/OS Style

31.7.2016 | 6 minutes of reading time

In the world of Internet of things (IoT) you work with a continuous flow of data. For this you have two options at hand, the first is to do batch processing long after the data is collected. The other option is to analyse the data while it is being collected. How to do this Fast Data approach can be found in this blog post: IoT Analytics Platform

The focus of the previous article had been on the SACK part of the SMACK Stack. That would be Apache Spark, Akka, Apache Cassandra and Apache Kafka. The following article will show how to complete the picture of a SMACK Stack application and how to actually run it with DC/OS and Mesos, for which the M in SMACK stands.

Infrastructure

So what’s the infrastructural part which is needed to run this fine SMACK application?
First of all, Mesos and the corresponding modules like marathon, chronos and DC/OS are needed. As Mesos itself is limited and only gets its kicks from working with DC/OS, we’ll focus on DC/OS. As the description of how to best install DC/OS with terraform is a topic of its own. We’ll skip that and depend on a pre-configured environment.
Now that the underlying “operating system” is available, let us take a closer look at what else is needed. As the S in SMACK stands for Spark, we will need to install the Spark Framework in DC/OS.

This Spark framework starts and keeps track of the Apache Spark master server, which will take care of our soon to come Spark jobs.
So now we’re set with Apache Spark and ready to actually digest the incoming data, but to do this, we need some data to work with. Therefore we will need an Apache Kafka Server first, that’s another easy task that can be installed in the same way as we’ve done previously.
Now that the consuming side is satisfied, we need a way to get our data into Kafka and since we’re able to consume via the Spark job, but the job has no place to put the data to. Therefore another part of the SMACK stack is needed to be installed: Apache Cassandra.

After that is done, Cassandra itself doesn’t know how to store the data, therefore it’s crucial to initialize the database with a schema. Chronos is there to the rescue, we’re able to add add a job. But since the Job needs a Docker image, we can’t do that via the Web UI, we’re gonna use the REST-API instead.
To add the new Job we need to issue the following curl command:

1bash curl -L -H 'Content-Type: application/json' -X POST -d '{json hash}' master.mesos/service/chronos/scheduler/iso8601

The {json hash} needs to be replaced by the following json syntax, which can be used for creating this job. Just make sure that the placeholder for cassandra host is replaced by the actual dns name entry.


{
    "schedule": "R0/2014-03-08T20:00:00.000Z/PT1M",
    "name": "init_cassandra_schema_job",
    "container": {
        "type": "DOCKER",
        "image": "codecentric/bus-demo-schema",
        "network": "BRIDGE",
        "forcePullImage": true
    },
    "cpus": "0.5",
    "mem": "512",
    "command": "/opt/bus-demo/import_data.sh $CASSANDRA_HOST",
    "uris": []
}

This will generate a new job with chronos, as it can be seen in the chronos overview:

Now make sure that this job is executed at least once to create the cassandra keyspaces needed for the application.

Easiness of Installation

Now that the foundation for our application is running we need to install our own application. The digesting part with consuming from Kafka and publishing to Cassandra is possible. Now how do we get our own Docker images containing of the Akka Ingest part into our cluster? That is quite easy the installation of an application contained inside a docker image can be achieved by using marathon. Let’s start with a new Application:

In this application we switch to the JSON mode and add a JSON snippet to it, as can be seen below:


{
  "id": "/ingest",
  "cpus": 1,
  "mem": 2048,
  "disk": 0,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "codecentric/bus-demo-ingest",
      "network": "HOST",
      "privileged": false,
      "parameters": [],
      "forcePullImage": true
    }
  },
  "env": {
    "CASSANDRA_HOST": "$CASSANDRA_HOST",
    "KAFKA_HOST": "$KAFKA_HOST",
    "KAFKA_PORT": "$KAFKA_PORT"
  }
}

But beware, we need to change those settings for cassandra and kafka host and port to the actual names used in our cluster. The same needs to be done for the Akka Http Service Docker image which also contains the front-end part.


{
    "id": "/dashboard",
    "container": {
        "type": "DOCKER",
        "docker": {
            "image": "codecentric/bus-demo-dashboard",
            "network": "HOST",
            "forcePullImage": true
        }
    },
    "acceptedResourceRoles": [
        "slave_public"
    ],
    "env": {
        "CASSANDRA_HOST": "$CASSANDRA_HOST",
        "CASSANDRA_PORT": "9042"
        "KAFKA_HOST": "$KAFKA_HOST",
        "KAFA_PORT" : "9092"
    },
    "healthChecks": [
        {
          "path": "/",
          "protocol": "HTTP",
          "gracePeriodSeconds": 300,
          "intervalSeconds": 60,
          "timeoutSeconds": 20,
          "maxConsecutiveFailures": 3,
          "ignoreHttp1xx": false,
          "port": 8000
        }
    ],
    "cpus": 1,
    "mem": 2048.0,
    "ports": [
        8000
    ]
}

Again, here it is crucial to exchange the cassandra and kafka host names for it. Now the cluster contains the applications for accepting the incoming data and for visualizing the data.

As our data is streaming into the system and we’re ready to actually show that data, it’s crucial to have the spark jobs running. Those spark jobs needed for this platform are installed via the dcos cli.

1dcos spark run --submit-args='--supervise -Dspark.mesos.coarse=true --driver-cores 1 --driver-memory 1024M --class de.nierbeck.floating.data.stream.spark.KafkaToCassandraSparkApp https://s3.eu-central-1.amazonaws.com/codecentric/big-data/bus-demo/bus-demo-digest-assembly-0.1.0.jar METRO-Vehicles $CASSANDRA_HOST $CASSANDRA_PORT $KAFKA_HOST $KAFKA_PORT'

Again make sure you exchange the placeholders with the actual ips and ports.

How do we know it is working?

Now that everything seems to be working, how do we know it actually does? One possibility is to check if the spark jobs are actually running. For this navigate to the DC/OS overview and select the Apache Spark master. Here a list of running spark jobs is given, the one we just previously deployed should be shown there. After we validated that this job is running, we might want to know how much data is streamed into the system. So you need to find the ip address of the KafkaToCassandraSparkApp driver. Take that ip address with port 4040 and you’re able to see how the data is streamed through the system.

Now that we know the data is actually streamed into the system let’s take a look at the front end.
When navigating to the frontend at :8080 you’ll see the Openlayers frontend where the latest Bus data should be flowing in.

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Always unblock YouTube

SMACK Stack DC/OS style …

… this Article has shown, doing SMACK the DC/OS style plain rocks. The easiness and readiness of the DC/OS platform give great value to the user.
Just the setup of this showcase takes you about half an hour and you’re ready to go and have your IoT analytics platform running. From the application developer view and operations view this platform has shown its readiness.
One part is missing in this article and might find its way into another article in future, it is the automatic setup of such a platform. Within the sources of the showcase you’ll find hints on how one can achieve that with terraform, but it’s out of scope of this article.
DC/OS with an automated setup will give you this application in NO-Time compared to creating a dedicated system on exclusive hardware. This leads us to the next extra you get from running with DC/OS on either on-premise or cloud hosted software. Your Application will scale as easy as this setup by just adding additional nodes.

Links:

https://blog.codecentric.de/en/2016/07/iot-analytics-platform/
https://github.com/zutherb/terraform-dcos
https://github.com/ANierbeck/BusFloatingData

Was this post helpful?

Likes

Blog author

Achim Nierbeck

Niederlassungsleiter

Do you still have questions? Just send me a message.

fromAchim Nierbeck

Solution Factory – In 9 Wochen von der Idee zum Produkt

Digitalisierung revolutioniert jedes Business und das schon seit über einer Dekade. Dieser andauernde Trend wird auch Ihr Business-Modell nicht unberührt lassen und hat einiges zu bieten. Es gibt zahlreiche Beispiele, wie und wo eine digitale Transformation...

Startup
Agilität
AWS
Cloud
CI/CD
Softwareentwicklung
Agile Methoden

21.7.2019 | 8 Minuten Lesezeit

Mahdi Ebrahimi

Achim Nierbeck

Solution Factory – How to get from idea to product in 9 weeks

Digitization has been revolutionizing each and every business out there for the past few decades. It has surely a lot to offer in your business domain as well: a new customer portal to improve users’ satisfaction and help you reach out to a whole new...

Agile
AWS
Cloud
CI/CD
Software development
Agile methods

30.6.2019 | 9 Minuten Lesezeit

Mahdi Ebrahimi

Achim Nierbeck

SMACK stack from the trenches

This is going to be a sum-up of the experience gathered on various projects done with the SMACK stack. For details about the SMACK stack you might want to take a look at the following blog – The SMACK Stack – Hands on . Apache Spark – the S in SMACK...

Reactive Programming
NoSQL
Big Data
Messaging

19.1.2017 | 12 Minuten Lesezeit

Achim Nierbeck

IoT Analytics Platform

The Internet of Things a.k.a. the next industrial revolution is the current hype, but what kinds of challenges do we face with the consumption of big amounts of data? One variant is to collect all the data and do post processing in batches. However, ...

Cloud
IoT
NoSQL
Scala
Big Data

13.7.2016 | 15 Minuten Lesezeit

Achim Nierbeck

Combining Apache Cassandra with Apache Karaf

Getting the best of Apache Cassandra inside Apache Karaf: this blog post will describe how easy it was to embed the NoSQL database inside the runtime. This can be helpful while developing OSGi-related applications with Karaf that work together with Cassandra...

NoSQL
Container

19.12.2014 | 9 Minuten Lesezeit

Achim Nierbeck

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Send

SMACK Stack DC/OS Style

Infrastructure

Easiness of Installation

How do we know it is working?

SMACK Stack DC/OS style …

Links:

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Solution Factory – In 9 Wochen von der Idee zum Produkt

Solution Factory – How to get from idea to product in 9 weeks

SMACK stack from the trenches

IoT Analytics Platform

Combining Apache Cassandra with Apache Karaf

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten

Contact

Send

SMACK Stack DC/OS Style

Infrastructure

Easiness of Installation

How do we know it is working?

SMACK Stack DC/OS style …

Links:

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Solution Factory – In 9 Wochen von der Idee zum Produkt

Solution Factory – How to get from idea to product in 9 weeks

SMACK stack from the trenches

IoT Analytics Platform

Combining Apache Cassandra with Apache Karaf

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten