LANGUAGE

Lookup additional data in Spark Streaming

1.6.2017 | 7 minutes of reading time

When processing streaming data, the raw data from the events are often not sufficient. Additional data must be added in most cases, for example metadata for a sensor, of which only the ID is sent in the event.

In this blog post I would like to discuss various ways to solve this problem in Spark Streaming. The examples assume that the additional data is initially outside the streaming application and can be read over the network – for example in a database. All samples and techniques refer to Spark Streaming and not to Spark Structured Streaming. The main techniques are

broadcast: static data
mapPartitions: for volatile data
mapPartitions + connection broadcast: effective connection handling
mapWithState: speed up by a local state

Broadcast

Spark has an integrated broadcasting mechanism that can be used to transfer data to all worker nodes when the application is started. This has the advantage, in particular with large amounts of data, that the transfer takes place only once per worker node and not with each task.

However, because the data can not be updated later, this is only an option if the metadata is static. This means that no additional data, for example information about new sensors, may be added, and no data may be changed. In addition, the transferred objects must be serializable.

In this example, each sensor type, stored as a numerical ID (1,2, …), is to be replaced by a plain-text name in the stream processing (tire temperature, tire pressure, ..). It is assumed that the assignment type ID -> name is fixed.

1val namesForId: Map[Long,String] = Map(1 -> "Wheel-Temperature", 2 -> "Wheel-Pressure")
2stream.map (typId => (typId,namesForId(typId)))

A lookup without broadcast. The map is serialized for each task and transferred to the worker nodes, even if tasks were previously executed on the worker.

1val namesForId: Map[Long,String] = Map(1 -> "Wheel-Temperature", 2 -> "Wheel-Pressure")
2val namesForIdBroadcast = sc.broadcast(namesForId)
3stream.map (typId => (typId,namesForIdBroadcast.value(typId)))

The map is distributed to the workers via a broadcast and no longer has to be transferred for each task.

MapPartitions

The first way to read non-static data is in a map() operation. However, not map() should be used but mapPartitions(). mapPartitions() is not called for every single element, but for each partition, which then contains several elements. This allows to connect to the database only once per partition and then to reuse the connection for all elements.

There are two different ways to query the data: Use a bulk API to process all elements of the partition together, or an asynchronous variant: an asynchronous, non-blocking query is issued for each entry and the results are then collected.

1wikiChanges.mapPartitions(elements => {
2  Session session = // create database connection and session
3  PreparedStatement preparedStatement = // prepare statement, if supported by database
4  elements.map(element => {
5    // extract key from element and bind to prepared statement
6    BoundStatement boundStatement = preparedStatement.bind(???)
7    session.asyncQuery(boundStatement) // returns a Future
8  })
9  .map(...) //retrieve value from future
10})

An example for an lookup on data stored in Cassandra using mapPartitions and asynchronous queries

The above example shows a lookup using mapPartitions: expensive operations like opening the connection are only done once per partition. An asynchronous, non-blocking query is issued for each element, and then the values are determined from the futures. Some libraries for reading from databases mainly use this pattern, such as the joinWithCassandraTable from the Spark Cassandra Connector .

Why is the connection not created at the beginning of the job and then used for each partition? For this purpose, the connection would have to be serialized and then transferred to the workers for each task. The amount of data would not be too large, but most connection objects are not serializable.

Broadcast Connection + MapPartitions

However, it is a good idea not to rebuild the connection for each partition, but only once per worker node. To achieve this, the connection is not broadcasted because it is not serializable (see above), but instead a factory that builds the connection on the first call and then returns this connection on all other calls. This function is then called in mapPartitions() to get the connection to the database.

In Scala it is not necessary to use a function for this. Here a lazy val can be used. The lazy val is defined within a wrapper class. This class can be serialized and broadcasted. On the first call, an instance of the non-serializable connection class is created on the worker node and then returned for every subsequent call.

1class DatabaseConnection extends Serializable {
2  lazy val connection: AConnection = {
3    // all the stuff to create the connection
4    new AConnection(???)
5  }
6}
7val connectionBroadcast = sc.broadcast(new DatabaseConnection)
8incomingStream.mapPartitions(elements => {
9  val connection = connectionBroadcast.value.connection
10  // see above
11})

A connection creation object is broadcasted and then used to retrieve the actual connection on the worker node.

MapWithState()

All solution approaches shown so far retrieve the data from a database, if necessary. This usually means a network call for each entry or at least for each partition. It would be more efficient to have the data directly in-memory available.

With mapWithState() Spark itself offers a way to change data by means of a state and, in turn, also to adjust the state. The state is managed by a key. This key is used to distribute the data in the cluster, so that all data must not be kept on each worker node. An incoming stream must therefore also be constructed as a key-value pair.

This keyed state can also be used for a lookup. By means of initialState(), an RDD can be passed as an initial state. However, any updates can only be performed based on a key. This also applies to deleting entries. It is not possible to completely delete or reload the state.

To update the state, additional notification events must be present in the stream. These can, for example, come from a separate Kafka topic and must be merged with the actual data stream (union()). The amount of data sent, can range from a simple notification with an ID, which is then used to read the new data, to the complete new data set.

Messages are published to the Kafka topic, for example, if metadata is updated or newly created. In addition, timed events can be published to the Kafka topic or can be generated by a custom receiver in Spark itself.

A simple implementation can look like this. First, the Kafka topics are read and the keys are additionally supplemented with a marker for the data type (data or notification). Then, both streams are merged into a common stream and processed in mapWithState(). The state was previously specified by passing the function of the state to the StateSpec.

1val kafkaParams = Map("metadata.broker.list" -> brokers)
2val notifications = notificationsFromKafka
3  .map(entry => ((entry._1, "notification"), entry._2))
4val data = dataFromKafka
5  .map(entry => ((entry._1, "data"), entry._2))
6val lookupState = StateSpec.function(lookupWithState _)
7notifications
8  .union(data)
9  .mapWithState(lookupState)

The lookupWithState function describes the processing in the state. The following parameters are passed:

batchTime: the start time of the current microbatch
key: the key, in this case the original key from the stream, together with the type marker (data or notification)
valueOpt: the value to the key in the stream
state: the value stored in the state for the key

A tuple consisting of the original key and the original value as well as a number will be returned. The number is taken from the state or – if not already present in the state – is chosen randomly.

1def lookupWithState(batchTime: Time, key: (String, String), valueOpt: Option[String], state: State[Long]): Option[((String, String), Long)] = {
2  key match {
3    case (originalKey, "notification") =>
4      // retrieve new value from notification or external system
5      val newValue = Random.nextLong()
6      state.update(newValue)
7      None // no downstream processing for notifications
8    case (originalKey, "data") =>
9      valueOpt.map(value => {
10        val stateVal = state.getOption() match {
11          // check if there is a state for the key
12          case Some(stateValue) => stateValue
13          case None =>
14            val newValue = Random.nextLong()
15            state.update(newValue)
16            newValue
17        }
18      ((originalKey, value), stateVal)
19      })
20  }
21}

In addition, the timeout mechanism of the mapWithState() can also be used to remove events after a certain time without updating from the state.

Conclusion

Loading additional information is a common problem in streaming applications. With Spark Streaming, there are a number of ways to accomplish this.

The easiest way is to broadcast static data at the start of the application. For volatile data, read per partition is easy to implement and provides a solid performance. With the use of the Spark states, the speed can be increased further, but it is more complex to develop.

Optimally, the data is always directly present on the worker node, on which the data is processed. This is the case, for example, with the use of Spark states. Kafka streams pursue this approach even more consistently. Here, a table is treated as a stream and – provided the streams are identical partitioned – distributed in the same way as the original stream. This makes local lookups possible.

Apache Flink is also working on efficient lookups, here under the title Side Inputs .

Was this post helpful?

LANGUAGE

Likes

Blog author

Matthias Niehoff

Head of Data

Do you still have questions? Just send me a message.

fromMatthias Niehoff

Zukunftssichere Observability mit OpenTelemetry

Observability, also die Möglichkeit, das Verhalten von Anwendungen in Echtzeit zu überwachen, Fehler schnell zu identifizieren und Probleme proaktiv anzugehen, ist ein unverzichtbares Element für erfolgreiche digitale Unternehmen. OpenTelemetry ist eine...

Observability

16.6.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Experience: Jetzt auch für APIs

APIs spielen eine zentrale Rolle bei der Digitalisierung. Extern angeboten, ermöglichen sie das Erschaffen von Ökosystemen und neuen Geschäftsmodellen. Unternehmen wollen gerne selbst als Plattform gesehen werden, auch hier sind APIs unerlässlich. Intern...

5.4.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Dass die Aufbau- und Ablauforganisation eines Unternehmens wichtig für eine schnelle und flexible IT ist, ist kein Geheimnis. Folglich gibt es eine Reihe von Ansätzen, die hier für Verbesserungen sorgen sollen: agile Ansätze, SAFe und alles, was es rund...

Agile Methoden
Agile

22.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Die Frage, was ein Nutzer in einer Anwendung darf, besteht oft aus komplexen Regeln und Konfigurationen, gespeichert in Datenbanken. Regelwerke werden in großen IT-Landschaften in verschiedenen Anwendungen häufig redundant implementiert, teils auch in...

8.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Schneller handeln bei Software-Schwachstellen

Sicherheitslücken in Software und Bibliotheken werden immer auftreten, unabhängig davon, wie viel Energie aufgebracht wird, um sie zu vermeiden. An die als Log4Shell bekannte Schwachstelle vor gut einem Jahr werden sich Viele noch schmerzhaft erinnern...

IT-Security

8.2.2023 | 3 Minuten Lesezeit

Matthias Niehoff

Ist die Cloud der große Umweltsünder?

Rechenleistung und Speicher kosten nicht nur Geld. Sie verbrauchen auch Mengen – potenziell klimaschädlicher – Energie. Das überrascht die Wenigsten, im kollektiven Bewusstsein ist es aber bislang kaum angekommen. Sehr wohl bewusst ist es natürlich ...

Cloud

18.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

WebAssembly – Mehr als nur ein Web-Standard

Seit 2017 unterstützen moderne Browser bereits WebAssembly (Wasm), seitdem ist der Hype mal größer, mal kleiner. Aber was ist WebAssembly überhaupt und warum wurde es geschaffen? WebAssembly ist ein standardisierter Bytecode, der in einer leichtgewichtigen...

Programmiersprache
Webdevelopment

4.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur. ...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Platform Engineering – Machen das nicht alle schon?

Plattformen sind aktuell ein sehr populäres Konzept, insbesondere in der Softwareentwicklung von Unternehmen. Viele sagen aber auch: So neu ist das doch gar nicht. Wir bieten unseren Entwicklern seit Jahren alle relevanten Tools und Werkzeuge, damit ...

DevOps
Accelerate

7.12.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Data Governance: Wie können wir Daten demokratisieren?

“Data is the new oil” ist inzwischen ein alter Hut. Jedes Unternehmen versucht, Daten besser zu nutzen, sei es, um die eigenen Prozesse zu optimieren, die Kunden besser zu verstehen oder neue Produkte anzubieten. Dabei stellen fast alle fest: Wir haben...

Data Science

23.11.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Machine Learning und künstliche Intelligenz sind aktuell in aller Munde und versprechen vielfältige Einsatzmöglichkeiten im Unternehmen. Trotzdem tun sich viele Unternehmen aktuell noch schwer, das Potential der Technologie zu nutzen. „Der Fokus liegt...

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Event time processing in Apache Spark and Apache Flink

With the new release of Spark 2.1, the event-time capabilities of Spark Structured Streaming have been expanded. It is time to take a closer look at the state of support and compare it with Apache Flink – which comes with a broad support for event time...

Big Data
Data
Machine Learning
Streaming

19.4.2017 | 9 Minuten Lesezeit

Matthias Niehoff

Distributed Stream Processing Frameworks for Fast & Big Data

Spark Streaming, Flink, Storm, Kafka Streams – that are only the most popular candidates of an ever growing range of frameworks for processing streaming data at high scale. This article is about the main concepts behind these frameworks. Furthermore...

Big Data
Data
Open Source
Messaging
Machine Learning
Streaming

26.3.2017 | 10 Minuten Lesezeit

Matthias Niehoff

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Bessere SQL-Datenpipelines mit dbt

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Dieser Artikel begleitet meinen Vortrag The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren, den ich am 20.10.2020 auf der data2day gehalten habe.Datenvisualisierung ist ausschlaggebend für Verständnis und KommunikationDatenvisualisierung...

Data
Data Science

19.10.2020 | 11 Minuten Lesezeit

Shirin Elsinghorst

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und ausreichend Spezialwissen möglich. Hauptsächlich große Internet-Konzerne wie Google, Apple und Facebook hatten das Geld, die Daten und die Expertise, um ...

Data
Machine Learning
Künstliche Intelligenz

6.7.2020 | 7 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Process Mining mit bupaR

Process Mining schafft Transparenz darüber, was wirklich in Unternehmen geschieht. Im Prozessmanagement werden die Idealvorstellungen eines Prozesses meist langwierig definiert. In der Praxis ist die Qualität dieser Beschreibungen jedoch oft nicht eindeutig...

Open Source
Data
Process Management

5.5.2020 | 9 Minuten Lesezeit

Anna Lukas

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Lookup additional data in Spark Streaming

Broadcast

A lookup without broadcast. The map is serialized for each task and transferred to the worker nodes, even if tasks were previously executed on the worker.

The map is distributed to the workers via a broadcast and no longer has to be transferred for each task.

MapPartitions

An example for an lookup on data stored in Cassandra using mapPartitions and asynchronous queries

Broadcast Connection + MapPartitions

A connection creation object is broadcasted and then used to retrieve the actual connection on the worker node.

MapWithState()

Conclusion

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Zukunftssichere Observability mit OpenTelemetry

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Experience: Jetzt auch für APIs

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Bessere SQL-Datenpipelines mit dbt

Schneller handeln bei Software-Schwachstellen

Ist die Cloud der große Umweltsünder?

WebAssembly – Mehr als nur ein Web-Standard

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Platform Engineering – Machen das nicht alle schon?

Data Governance: Wie können wir Daten demokratisieren?

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Event time processing in Apache Spark and Apache Flink

Distributed Stream Processing Frameworks for Fast & Big Data

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Bessere SQL-Datenpipelines mit dbt

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

KI für KMU: (Teil-)Automatisierung der Qualitätskontrolle von Bauteilen

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Process Mining mit bupaR

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten