SMACK stack from the trenches

19.1.2017 | 12 minutes of reading time

This is going to be a sum-up of the experience gathered on various projects done with the SMACK stack. For details about the SMACK stack you might want to take a look at the following blog – The SMACK Stack – Hands on .
Apache Spark – the S in SMACK – is used for analysis of data – real time data streaming into the system or already stored data in batches.
Apache Mesos – the M in SMACK – is the foundation of the stack. All of the applications do run on it. In our cases we’ve been using Mesospheres DC/OS on top of Apache Mesos for the installation and administration of the stack and our own applications.
Lightbend’s Akka – the A in SMACK – is used for fast data stream processing. In most use cases it’s been either used for fast ingestion of data or for fast extraction through in-stream-processing.
Apache Cassandra – the C in SMACK – is a fast write and read storage for the fast-data-processing platform.
Apache Kafka – the K in SMACK – is the intermediate storage for streaming data. It helps to decouple in and out application logic while still being fast enough to add no overhead of time on the stream of data.

In those projects the architecture has looked roughly like this

The ingestion, implemented with Akka, is kind of like Enterprise Integration on steroids. Instead of having a lot of different connectors you’ll end up with just a few entry points, but doing so in a very, very fast way. For each of the projects it had been a requirement to have fast input and storage of the data as well as having that data visible in near real-time. That’s where Akka comes into play again – either being connected to Kafka for real-time streaming of data via Websockets or as connector for Cassandra.
All of the scenarios running this stack have been build on AWS as cloud provider. This made it especially easy to setup and tear down the stack for development.

DC/OS – Apache Mesos

Apache Mesos in combination with Mesospheres DC/OS is the foundation for working with the SMACK stack. The Mesos
kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elasticsearch) with API’s for resource management and scheduling across entire datacenter and cloud environments.
On top of Mesos, Mesosphere’s DC/OS will give you a command line interface and a nice integration with Mesosphere’s Marathon. Especially the first one in combination with Ansible can be used to automate the setup of a whole cluster.
The following shell command installs the Cassandra framework used by DC/OS.

1dcos package install cassandra

In combination with the dcos command dcos package list it’s possible to verify the success of the installation.

While Apache Mesos is used as “Kernel”, Marathon is used as “init.d” on top of it. Marathon makes sure deployed applications are running successfully. In case of a failure those applications are restarted by Marathon. While Marathon takes care of long running tasks, Mesosphere’s Metronome is in charge of short running cron-like tasks.
Marathon and Metronome are so called Mesos frameworks. A framework takes care of reserving resources from Mesos, schedules the application to be launched and sometimes monitors those applications.

Apache Cassandra

The nice thing about DC/OS is that it is very easy to run Apache Cassandra via a specialized Mesos framework. This framework not only helps in installing a Cassandra cluster on top of Mesos, it also makes sure of handling recovery of failed instances. The configuration and sizing of the Cassandra is crucial for having a high performance fast data platform based on this SMACK stack.

Storage

For performance reasons it’s best to start with a SSD hard drive, but also EBS volumes can be used. Using EBS we never experienced any shortcomings, though it seemed we had an increase in write queues. This usually happens if commit log and SSTables are written to the same storage. In cases like these it’s crucial to have a fast connected EBS at hand.

CPU

In short, more CPU will give you more throughput. As Cassandra uses different thread pools for write and read paths, an increase in number of CPUs helps tremendously, as those thread-pools have more dedicated CPUs.

Memory

As Cassandra had commodity hardware in mind when being designed, a lot of heap is of no use for a cassandra instance. A lot of RAM is rather useful for the underlying system itself, as the page cache will consume all the memory not used by other applications. Therefore configuring Cassandra with 8GB of RAM as heap plus 24GB for the underlying system for page caches is sufficient. It’s rather crucial to make sure the new generation heap is configured properly. As recommended in the Cassandra documentation the new generation of the heap should be set to 1400MB. Which is equivalent to 100MB times the number of CPUs. The rule of thumb is to either use 100MB times number of CPUs or a quarter of the maximum Heap, where the lesser number needs to be used.
As the system is run with a Java 8 runtime, garbage collection can be set to the GC1 garbage collector.

Apache Kafka

Like Apache Cassandra and other Big-Data systems, Kafka has also been designed with commodity hardware in mind. Therefore around 5GB of Heap for the Kafka process is enough. Again here it’s more important to have enough RAM for the hard drive caches as for Kafka heap usage. Regarding storage for Kafka the same principle applies. SSD should be favored but a fast connected EBS storage is sufficient. Kafka and its thread pools also profit greatly by a higher number of CPUs.
As Cassandra and Kafka in a production use-case are consuming rather lot of cpu and harddrive, one should consider to run those frameworks on dedicated machines. This does break the rule of “every framework is treated equal”, but especially the page caching mechanism only works if those applications are the only user of all resources.

Apache Spark

Apache Spark is already optimized to run on Apache Mesos. So after the easiness of installing it via a dcos package install spark, spark is ready to be used. Spark comes with a default Mesos scheduler, the MesosClusterDispatcher also known as Spark Master. All spark jobs will register themselves with the master and in turn will also be a Mesos framework. This driver is negotiating with Mesos about the required resources. From there on it takes care about the executors for Spark. Within this scenario of using DC/OS the executors are docker images with a DC/OS optimized configuration. They already contain configurations for access of a HDFS inside the DC/OS cluster.

Metrics

Spark metrics are nice while being watched in real time, but it would be nice to have the metrics available all the time, especially since after the death of the driver this data is gone. One way is to use the Spark history server. The history server is nice, but requires an HDFS to be available. This alone isn’t much of a downside but the requirements on running HDFS on DC/OS is rather high. At least 5 instances are required with lots of hard-drive. This just for taking a look at the Spark metrics is rather expensive. Therefore a good possibility is to use ELK (ElasticSearch, Logstash and Kibana) for monitoring. But how do we get to the logs of Spark. In a “regular” environment, you’ll usually just add some logging details to the executors, but as our executors are started by mesos and managed by the Spark Driver this needs some extra tweaking in the DC/OS world.

To enable the spark metrics a configuration is needed as the following:

1# Enable Slf4jSink for all instances by class name
2*.sink.slf4j.class=org.apache.spark.metrics.sink.Slf4jSink
3 
4# Polling period for Slf4JSink
5*.sink.slf4j.period=1
6 
7*.sink.slf4j.unit=seconds

With these settings Spark logs all metrics available to the std logging mechanism. The tricky part is to actually have those settings enabled inside the docker image provided by DC/OS. Actually this isn’t possible, so we built a custom Docker image already containing these settings.
The Dockerfile for such a preconfigured Docker image can be seen below, it’s not much magic.

1FROM  mesosphere/spark:1.0.2-2.0.0
2 
3ADD ./conf/metrics.properties /opt/spark/dist/conf/

External Storage

As you’ve seen in the previous section, enabling a HDFS system can be quite costly in terms of storage. In this scenario an extra HDFS system isn’t required, therefore storing data in S3 is a quick win. As with the metrics, DC/OS’ own spark image is optimized for HDFS and therefore needs some tweaking for accessing S3.
First, it is crucial to have those S3 accessing libraries available in your docker image. Second, you’ll need to make sure those newly available libraries are present in the configuration of your executor.

spark-defaults.conf content:

1spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
2spark.executor.extraClassPath /opt/spark/dist/extralibs/aws-java-sdk-1.7.4.jar:/opt/spark/dist/extralibs/hadoop-aws-2.7.2.jar:/opt/spark/dist/extralibs/joda-time-2.9.jar
3spark.driver.extraClassPath /opt/spark/dist/extralibs/aws-java-sdk-1.7.4.jar:/opt/spark/dist/extralibs/hadoop-aws-2.7.2.jar:/opt/spark/dist/extralibs/joda-time-2.9.jar

In the configuration above, there is an additional section setting the filesystem type to S3A which is needed for a faster access to it. The Apache Hadoop driver supports two different ways of accessing S3, one the default S3 is a block based access on the S3, while the S3N or S3A use object based access on S3. Hadoop now only supports S3A as it’s the successor of S3N (native). The extraClassPath entries are needed for the driver and also for the executor. An additional step of creating your own Docker image is to make sure those libraries are available.

Using a custom Docker spark image

When submitting Spark jobs via DC/OS usually you’ll issue a command as the following:

1dcos spark run --submit-args='--driver-cores 0.1 --driver-memory 1024M --class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.10-1.4.0-SNAPSHOT.jar 10000000'

Now if you want to use your own custom docker image you need to adapt the command to look like the following.

dcos spark run –submit-args =’…’ –docker-image=my-own-docker/spark-driver

In that case you need to make sure this docker image is accessible to your Mesos agents.
Because installing those Spark job can be cumbersome, we used Metronome for an easy installation of those Spark jobs.

Throughput

After all those technical implications let’s take a look at what can be achieved with such a SMACK platform. Our requirements contained the ability to process approximately 130 thousand messages per second. Those messages needed to be stored in a Cassandra and also be accessible via a frontend for real time visualization.
This scenario is build on the basic architecture. Akka Streams are used for ingestion and also for the real-time visualization. The ingestion does some minor protocol transformation and publishes the data to a Kafka sink. While 4 nodes are capable of handling these 130k msg/s, 8 ingestion instances are needed to keep up with 520k msg/s.
A high throughput in the ingestion is nice, but how much delay does it produce? For example: are those messages queuing somewhere internally Is the back-pressure to high? While being able to handle the throughput of 520k msg/s the delay has an average of 250ms of latency between message creation and storage and Kafka. If the network and ntpd fuzziness is taken into account, the delay produced by the ingest itself is comparable to be nonexistent.
So the ingestion and therefore the visualisation consumer are capable of handling the data in real-time. How does storing the data in Cassandra compare to this?
The easiest use-case for Apache Spark in this scenario is to stream the incoming data into the Apache Cassandra database. While Kafka topics are perfectly suited to be consumed in a streaming way, Cassandra isn’t capable of doing streamed inserts. That’s one of the reasons the largest amount of time is consumed by connecting and communicating with Cassandra. Therefore it’s better to have a batch interval of about 10 seconds. The base-line for connection and minimum time to just get going is about 1 to 2 seconds depending on the underlying hardware and amount of CPUs allocated for the executors. The 10 second window frame also helps handling possible but not envied downtimes of the Spark Job. With this time-buffer it’s possible to handle a longer downtime. Another reason to keep the 10 second window, data might happen to be hold back in the pipeline before consuming in Spark. This will also address this issue of a peak-load.
So let’s talk numbers: a Spark job storing data into Cassandra batched for 10 seconds takes about 4 seconds for 1.3 Million events within the batch window. To store the 5.2 Million events inside the batch window, Spark takes 8 seconds to store those events in Cassandra. As Cassandra is a rather large stakeholder when streaming data from Kafka to Cassandra via Spark, it also needs to scale horizontally with the amount of data. The following table gives a brief overview of amount of messages, average latency per batch and amount of Cassandra nodes needed to handle the amount of messages per second.

msg/s	Time needed Batched 10s	Cassandra Nodes
130k msg/s	4s	5
260k msg/s	6s	10
520k msg/s	8s	15

When measuring the throughput of simple events of approximately 60 byte size, the initially upper limit was reached by around 500k msg/s. It turned out to be a network capacity limit of the selected AWS box. This limit of 20MB/s correlated nicely with the approximately 500k msg/s and their corresponding byte size. When selecting another type of box with better network settings as entrypoint for the incoming data, processing 520k msg/s was easily achieved.

Conclusion

Sometimes working with bleeding-edge software is more like grabbing into falling daggers, on the other hand it’s great to see it turn into an extremely powerful and capable stack. With time progressing, the Mesosphere DC/OS provided functionality turned more stable and more sophisticated. The out-of-the-Box frameworks for the SMACK stack play nicely, and due to those frameworks the stack is resilient concerning failures. A “misbehaving” app will be stopped by the framework while being restarted in the same minute. This failsafe and failure-tolerant behavior gives great confidence in the running cluster as it rarely needs operation to be in charge.
It turned out to be obvious, but this stack scales linearly regarding performance, throughput and used hardware. But not only does it scale up, it also helps to start with a smaller stack. Especially using Apache Mesos as foundation helps to get the most out of your provided resources. This is especially useful when still developing the stack. In this case Apache Cassandra and Apache Kafka may be on the same node, as throughput isn’t the main goal, yet.
While Monitoring is needed it’s still not provided out of the box, a custom solution is needed. For monitoring purposes it’s still required to have your custom solution, with DC/OS 1.10 you’ll most likely will have a solution for metrics . DC/OS provided Frameworks place a collecting module next to the application, collecting the metrics of that process. Those metrics are locally collected and sent to a central server based on Apache Kafka. This will be an exciting new functionality for measuring metrics of your cluster in future.

Was this post helpful?

Likes

Blog author

Achim Nierbeck

Niederlassungsleiter

Do you still have questions? Just send me a message.

fromAchim Nierbeck

Solution Factory – In 9 Wochen von der Idee zum Produkt

Digitalisierung revolutioniert jedes Business und das schon seit über einer Dekade. Dieser andauernde Trend wird auch Ihr Business-Modell nicht unberührt lassen und hat einiges zu bieten. Es gibt zahlreiche Beispiele, wie und wo eine digitale Transformation...

Startup
Agilität
AWS
Cloud
CI/CD
Softwareentwicklung
Agile Methoden

21.7.2019 | 8 Minuten Lesezeit

Mahdi Ebrahimi

Achim Nierbeck

Solution Factory – How to get from idea to product in 9 weeks

Digitization has been revolutionizing each and every business out there for the past few decades. It has surely a lot to offer in your business domain as well: a new customer portal to improve users’ satisfaction and help you reach out to a whole new...

Agile
AWS
Cloud
CI/CD
Software development
Agile methods

30.6.2019 | 9 Minuten Lesezeit

Mahdi Ebrahimi

Achim Nierbeck

SMACK Stack DC/OS Style

In the world of Internet of things (IoT) you work with a continuous flow of data. For this you have two options at hand, the first is to do batch processing long after the data is collected. The other option is to analyse the data while it is being collected...

31.7.2016 | 6 Minuten Lesezeit

Achim Nierbeck

IoT Analytics Platform

The Internet of Things a.k.a. the next industrial revolution is the current hype, but what kinds of challenges do we face with the consumption of big amounts of data? One variant is to collect all the data and do post processing in batches. However, ...

Cloud
IoT
NoSQL
Scala
Big Data

13.7.2016 | 15 Minuten Lesezeit

Achim Nierbeck

Combining Apache Cassandra with Apache Karaf

Getting the best of Apache Cassandra inside Apache Karaf: this blog post will describe how easy it was to embed the NoSQL database inside the runtime. This can be helpful while developing OSGi-related applications with Karaf that work together with Cassandra...

NoSQL
Container

19.12.2014 | 9 Minuten Lesezeit

Achim Nierbeck

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Ein tolles Paar: Spring Webflux und Kotlin Coroutines

In diesem Artikel gehen wir darauf ein, wie mithilfe des Spring-Webflux-Projekts eine reaktive Anwendung erstellt werden kann und welche Herausforderungen dieser Ansatz mit sich bringt. Wir erläutern kurz, was Kotlin Coroutines sind und zeigen, wie die...

Kotlin
Spring
Reactive Programming

18.12.2023 | 7 Minuten Lesezeit

Christian Franzen

Ferdinand Ade

Reactive Programming mit Spring Webflux

In diesem Artikel geben wir einen Überblick über Reactive Programming, erläutern, welche Prinzipien diesem zugrunde liegen und wann ein Einsatz sinnvoll sein kann. Anschließend zeigen wir, wie mithilfe des Spring-Webflux-Projekts eine reaktive Anwendung...

Spring
Java
Reactive Programming

11.12.2023 | 13 Minuten Lesezeit

Christian Franzen

Ferdinand Ade

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Stream Processing mit Kafka Streams und Spring Boot

Kontinuierliche Datenströme in verteilten Systemen ohne Zeitverzögerung zu verarbeiten, birgt einige Herausforderungen. Wir zeigen euch, wie Stream Processing mit Kafka Streams und Spring Boot gelingen kann. Alles im Fluss: Betrachtet man Daten als fortlaufenden...

Softwarearchitektur
Cloud
IoT
Messaging
Kotlin
Spring

20.12.2021 | 20 Minuten Lesezeit

Maik Fleuter

Lukas Maier

Entwicklung strukturierter Maßnahmen zu effizienter Remote-Work: Ein „...

Die Ausbreitung des Coronavirus Sars-CoV-2 hat weltweit zu vielen, kurzfristig umgesetzten Präventionsmaßnahmen geführt um das Ausmaß weiterer Ansteckungen zu begrenzen. Teil der Maßnahmen sind neben Betriebsschließungen vor allem auch strikte Homeoffice...

Agilität
Collaboration
Reactive Programming
Remote Work

20.3.2020 | 10 Minuten Lesezeit

Steffen Oehme

Simple Deep Learning mit Amazon SageMaker

In unserem neuesten codecentric.AI-Video geben wir eine kurze Einführung in Amazon SageMaker und zeigen, wie man damit schnell und einfach ein Bildklassifikationsmodell trainieren kann, das Brillenträger von Nicht-Brillenträgern unterscheidet. Mit ...

Big Data
AWS
Cloud
Data
Machine Learning
Künstliche Intelligenz
Python

11.7.2018 | 5 Minuten Lesezeit

Shirin Elsinghorst

Oliver Moser

Schema First Design – Produktentwicklung mit GraphQL

Zu den schwierigsten Aufgaben bei der Entwicklung neuer Produkte gehören die Koordinierung der Teams, der Featureumfang und unbekannte Faktoren in Form der „moving parts“. Laut Definition müssen wir bestimmte laufende Prozesse berücksichtigen. Ein gutes...

API
Big Data

25.6.2018 | 7 Minuten Lesezeit

Toni Haupt

Wie schreibt man eine Kotlin-DSL – z.B. für Apache Kafka?

Das Interesse an der Programmiersprache Kotlin wächst, und auch die Verwendung von Kotlin in Projekten nimmt zu. Ein Bereich, in dem Kotlin hervorragend verwendet werden kann, ist die Implementierung von speziellen Domänen-spezifischen Sprachen, den ...

Messaging
DSL
Kotlin

23.6.2018 | 9 Minuten Lesezeit

Peter-Josef Meisch

Deep Learning Workshop bei der codecentric AG in Solingen

Big Data – ein Schlagwort, das zur Zeit in aller Munde ist, egal ob in nerdigen Blogs, wissenschaftlichen Artikeln oder der Tageszeitung. Doch wie funktionieren Analysen von Big Data eigentlich? Um das heraus zu finden, habe ich an dem Workshop über ...

Big Data
Data
Künstliche Intelligenz
Machine Learning

6.2.2018 | 6 Minuten Lesezeit

Shirin Elsinghorst

BigchainDB – Das leichtgewichtige Blockchain-Framework [blockcentric #...

Mit BigchainDB sehen wir eines der ersten vollumfänglichen, aber einfachen Blockchain-Frameworks. Das Projekt macht es sich zur Aufgabe, Blockchain für eine große Anzahl von Entwicklern und Use Cases nutzbar zu machen, ohne besonderes Wissen in Kryptographie...

Big Data
Blockchain

3.1.2018 | 5 Minuten Lesezeit

Jonas Verhoelen

kibconfig – Wartungstool für Kibana Dashboards

Als wir vor 2 Jahren zu Beginn unseres Projekts damit begannen, unser ELK Logging über Kibana Dashboards zu optimieren, standen wir vor einem Problem: Wie konnten wir unsere für die PP-Umgebung vorbereiteten Dashboards, Visualisierungen und gespeicherten...

NoSQL
APM

12.10.2017 | 3 Minuten Lesezeit

Carsten Rohrbach

Data Science und Big Data: Eine Mate mit… Michael Plümacher #EineMateMit

„Aufgrund der gestiegenen Rechen- und Speicherkapazitäten sind in den letzten Jahren ganz neue Möglichkeiten entstanden“, sagt Michael Plümacher, Data Scientist bei der codecentric. Einige seiner aktuellen Data-Science- und Big Data-Projekte stellt er...

Big Data
Data
Community

21.9.2017 | 1 Minuten Lesezeit

Felix Braun

akka-testkit richtig verwenden

Das Testen von Aktoren unterscheidet sich vom „traditionellen“ Testen von Objekten oder Funktionen. Erstens ist asynchroner Nachrichtenaustausch der einzige Weg, um mit Aktoren zu interagieren. Das bedeutet, dass wir nicht einfach eine Methode oder Funktion...

Reactive Programming
Scala

18.9.2017 | 4 Minuten Lesezeit

Heiko Seeberger

Fraud-Analyse mit Data-Science-Techniken

Was ist Fraud und was macht es für Data Science interessant?Im Zusammenhang mit Data Science beschreibt das englische Wort „Fraud“ in der Regel Betrug im Online-, Kreditkarten- oder Versicherungsgeschäft. Betrugsversuche bei Geschäftsabschlüssen gibt...

Big Data
Data
Machine Learning

5.9.2017 | 9 Minuten Lesezeit

Shirin Elsinghorst

Graphen-Visualisierung mit Neo4j

In diesem Artikel möchte ich nach einer kurzen Einführung in die Graphen-Theorie einen Überblick über die NoSQL-Datenbank Neo4j geben. Insbesondere werde ich auf die Möglichkeiten eingehen, die Neo4j bei der Visualisierung von Graphen anbietet.Was ist...

Datenbank
NoSQL

18.6.2017 | 10 Minuten Lesezeit

Tobias Trelle

Datenlookup in Spark Streaming

Bei der Verarbeitung von Streaming-Daten reichen die Rohdaten aus den Events häufig nicht aus. Meist müssen noch zusätzliche Daten hinzugezogen werden, beispielsweise Metadaten zu einem Sensor, von dem im Event nur die ID mitgeschickt wird.In diesem ...

Softwarearchitektur
Scala
Big Data
Data
Streaming

1.6.2017 | 7 Minuten Lesezeit

Matthias Niehoff

Event-Zeit-Verarbeitung in Apache Spark und Apache Flink

Mit dem neuen Release von Spark 2.1 wurden die Eventzeit-Fähigkeiten von Spark Structured Streaming ausgebaut. Höchste Zeit also den Stand der Unterstützung genauer unter die Lupe zu nehmen und mit Apache Flink – ausgestattet mit einem breiten Support...

Big Data
Data
Machine Learning
Streaming

19.4.2017 | 9 Minuten Lesezeit

Matthias Niehoff

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

SMACK stack from the trenches

DC/OS – Apache Mesos

Apache Cassandra

Storage

CPU

Memory

Apache Kafka

Apache Spark

Metrics

External Storage

spark-defaults.conf content:

Using a custom Docker spark image

Throughput

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Solution Factory – In 9 Wochen von der Idee zum Produkt

Solution Factory – How to get from idea to product in 9 weeks

SMACK Stack DC/OS Style

IoT Analytics Platform

Combining Apache Cassandra with Apache Karaf

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Ein tolles Paar: Spring Webflux und Kotlin Coroutines

Reactive Programming mit Spring Webflux

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Streaming Wikipedia mit Apache Kafka

Stream Processing mit Kafka Streams und Spring Boot

Entwicklung strukturierter Maßnahmen zu effizienter Remote-Work: Ein „...

Simple Deep Learning mit Amazon SageMaker

Schema First Design – Produktentwicklung mit GraphQL

Wie schreibt man eine Kotlin-DSL – z.B. für Apache Kafka?

Deep Learning Workshop bei der codecentric AG in Solingen

BigchainDB – Das leichtgewichtige Blockchain-Framework [blockcentric #...

kibconfig – Wartungstool für Kibana Dashboards

Data Science und Big Data: Eine Mate mit… Michael Plümacher #EineMateMit

akka-testkit richtig verwenden

Fraud-Analyse mit Data-Science-Techniken

Graphen-Visualisierung mit Neo4j

Datenlookup in Spark Streaming

Event-Zeit-Verarbeitung in Apache Spark und Apache Flink

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten