LANGUAGE

Distributed Stream Processing Frameworks for Fast & Big Data

26.3.2017 | 10 minutes of reading time

Spark Streaming, Flink, Storm, Kafka Streams – that are only the most popular candidates of an ever growing range of frameworks for processing streaming data at high scale. This article is about the main concepts behind these frameworks. Furthermore the three Apache projects Spark Streaming, Flink and Kafka Streams are briefly classified.

Why Stream Processing?

The processing of streaming data is gaining in importance due to the steadily growing number of data sources that continuously produce and offer data. In addition to the omnipresent Internet of Things, these include, for example, click streams, data in the advertising business, as well as device and server logs.

Infinite and continuous data is not a new phenomenon. Even now, many data correspond to this scheme. For example, changes to master data occur continuously, but only at low frequency. Master data is processed according to the classic request / response pattern. In the case of non-time-critical changes or larger volumes, the data are often stored collectively and then processed regularly by batch processes. These then run, for example, every night or at shorter intervals.

However, daily intervals are often not sufficient. Speed is needed: analyzes and evaluations are expected promptly and not minutes or even hours later. At this point stream processing comes into play: data is processed as soon as they are known to the system. This has started with the lambda architecture (cf. [1]), in which the stream and batch processing takes place in parallel, since the stream processing could not guarantee consistent results. With today’s systems, it is also possible to achieve consistent results in almost real-time with streaming processing only (cf. [2]).

Time Matters

An important aspect of streaming is the time. Essentially three different times can be distinguished:

Event time: Time at which the event actually occurred
Ingestion time: Time at which the event was observed in the system
Processing time: Time at which the event was processed by the system

Abb. 1: Exemplary representation of event time and processing time. With late (yellow, green, red) and out-of-order events (blue)

In practice, the event time is particularly interesting compared to the ingestion and processing time. The difference between the event time and the processing time can vary greatly. The reasons are numerous: network latencies, distributed systems, hardware failures or even irregular data delivery. When being processed by the processing time, this is not important. The data is analyzed based on the system time of the processor: if an event arrives at 12 o’clock, it is irrelevant that it has already occurred at 11 o’clock.

But this is not the normal use case: If an event occurs at 11 o’clock I would like to treat it in the time it occurred. The question here is: When do I know that I got all the events until 11am? How long do I wait for events? There are several strategies and concepts to solve those problems. On of them is the Dataflow/Beam Model. Here, concepts such as watermarks, triggers and accumulators help:

Watermarks: When did I collected all the data?
Trigger: When should I trigger the calculation?
Accumulation: How do I merge individual calculations, for example when data is subsequently added.

It is easy to write a separate article about these three concepts. Tyler Akidau, the head behind streaming on Google, has already summed this up. Therefore it is recommended to read his article for details [3].

State & Window

Any non-trivial application will correlate incoming events with each other. This requires a state in which previous events are stored temporarily. This state can be stored indefinitely or explicitly limited in time. An example of an infinite stored state is a lookup table with metadata. A temporally limited state is, for example, a window.

A window is used to aggregate and analyze data for a specific period of time. This is necessary in almost every application, since the data stream never ends. There are different types of Windows.

Tumbling Window: Non-overlapping, fixed time segments
Sliding Window: Overlapping, fixed time segments
Session Window: Non-overlapping time segments of different length. Defined by certain events or by exceeding a certain time between two events

Abb. 2: Tumbling and sliding window with a time window of 4 seconds and a sliding interval of 2 seconds with the sliding window. Within each window the values are summed.

Abb. 3: Session windows with an inactivity of at least two minutes between two events for a key.

For the definition of windows, the distinction between event and processing time is important: windows based on processing time are very simple to implement; windows based on event time need the above event time strategies, in order not to grow infinitely.

API & Runtime Environment

First differences in the frameworks can be found within the API and the general processing model. Differentiating between a native streaming approach and microbatching. In native streaming, incoming data is processed directly while microbatching collects the incoming data for a certain time (typically 1 – 30s) and then processes it together. The next microbatch can then be started either directly after the completion of the previous batch, or only after the fixed interval has elapsed. In both cases, microbatching increases latency, but the handling of errors is somewhat easier. The frequently mentioned advantages of the very high throughput can now also be achieved by native streaming frameworks. They also offer more flexibility for windows and states.

Visible to the developer is mainly the API. Here, too, a distinction can be made between two variants: a component-based and a declarative, high-level API. For the former, the flow is described by several components (source -> processing 1 -> processing 2 -> sink), the latter describes the operations on data (map, filter, reduce) similar to Scala Collections or Java 8 streams . The description of components provides more flexibility in the distribution of data streams, while the declarative API often already provides higher-order functions and automatic optimization.

Finally, the question is: Where are the applications running? One can distinguish between two – surprise 🙂 – basic alternatives. Some frameworks need a special cluster consisting of master nodes and worker nodes. These clusters then also deal with resource management and error handling, but can also outsource this to other tools (for example, YARN or Mesos). Other frameworks come as a simple library, which can be integrated into your own application. Running and scaling the application must then be taken over by other tools. Here you have the full flexibility from running a jar file via docker up to Mesos or YARN.

Distributed systems are unreliable!

All three frameworks are specialized in processing large amounts of data and solve this by horizontal scaling. These distributed systems are inherently unreliable: single nodes can fail, the network is inconsistent, or the database in which the results are to be written is unavailable.

For this reason, each framework has different mechanisms to achieve certain guarantees. These range from microbatching, in which small batches are repeated, via acknowledgments for individual data sets, to transactional updates on source and sink. The guarantees achieved are then usually at-least-once or exactly-once. Since exactly-once is often difficult to achieve, at-least-once guarantees with idempotent operations are often sufficient in terms of both speed and error tolerance.

Isn’t there something that can help us?

Time handling, state & windows, a runtime environment, all in a distributed fashion: streaming applications are complex. There are a number of projects to help with these problems. Three of them briefly presented:

Apache Spark (Streaming)
Apache Spark is currently one of the most popular projects in the streaming field. Started as a better MapReduce, support for streaming data was added later. Spark streaming relies on microbatching with a declarative API. At the moment, only the processing time is fully supported, but with the new Structured Streaming API the support for event time processing has also been gradually expanded since version 2.0. The same is true for supporting windows. The state is stored locally in memory or on disk and is regularly backed up by checkpointing. Since Spark is now distributed with every major Hadoop distro, the overall distribution is very high. There is also a large ecosystem with many tools and connectors.

Apache Flink
When it comes to event-time processing, Apache Flink is currently the first choice. Watermarks and triggers are supported as well as different window operations. Flink pursues a native streaming approach and thus achieves low latencies. As with Spark Streaming it offers a declarative API, with the possibility to use so-called rich functions, in which, for example, a state can be utilized. Unlike Spark, the state implementations can be chosen from different implementations: in-memory, hard disk or RocksDB. Flink is slightly younger than Spark, but is gaining in popularity. Likewise the community and the ecosystem is growing steadily, but is not yet as big as with Spark.

Apache Kafka Streams
The streaming framework from the Kafka ecosystem is the latest representative in this overview. It is based on many concepts already contained in Kafka, such as scaling by partitioning the topics. Also for this reason it comes as a lightweight library, which can be integrated into an application. The application can then be operated as desired: standalone, in an application server, as docker container or via a resource manager such as mesos. Flink & Spark, on the other hand, always need a cluster, either built with the equipment of the frameworks or YARN / mesos. Kafka Streams, however, is limited to Kafka as a source and also as a sink. But you can connect a Kafka topic to other systems through Kafka Connect, with over 60 available connectors. Apart from a declaratory, Kafka also has a component-oriented API, a rudimentary support for event time, and RocksDB as a state implementation. While Kafka is already very mature and often used in connection with Flink and Spark, the streaming component is still quite young. So the community and the spread is rather small. It is, however, to be expected that both will grow rapidly.

Update:

It should be noted that Kafka Streams does not use the concepts of the Beam Model to tackle the challenges of event time processing. Streams is built on the concept of KTables and KStreams , which helps them to provide event time processing.

And what suits me?

Finally, the question is: Which framework suits me? If event-time processing is required and you do not mind working with the concepts of the Beam Model, you could go with Apache Flink. Another advantage is the low latency. The most important systems (Kafka, Cassandra, Elasticsearch, SQL databases) can be integrated relatively easily.

The low latency and an easy to use event time support also apply to Kafka streams. So if Kafka is already in use ~~and the processing is rather simple, without complex requirements for event processing~~ (Streams can also be used for more complex stream processing), Kafka Streams is a good alternative. For this you have to connect the other systems, like databases, via Kafka Connect and care about the runtime environment. This can also be an advantage if I can use existing tools, for example from the Docker ecosystem.

And Spark? If event time is not relevant and latencies in the seconds range are acceptable, Spark is the first choice. It is stable and almost any type of system can be easily integrated. In addition it comes with every Hadoop distribution. Furthermore the code used for batch applications can also be used for the streaming applications as the API is the same.

Only with very large states Spark can cause problems. The support for event time is expanded with Spark 2.1.

Conclusion

Stream processing frameworks significantly simplify the processing of large amounts of data. The presented frameworks primarily solve problems in the area of distributed processing, whereby easy-to-scale solutions can be developed. Equally important are the different aspects of the time processing, which all frameworks support in some way.

That is what distinguishes those systems from libraries such as Akka Streams, RxJava, or Vert.x. The presented frameworks are mainly located in the Big and Fast Data area, while the libraries can also be used to build smaller, more reactive applications, but usually without native support for event time and clustering.

It remains to be noted that the presented framework can all help with current challenges in the fast data area and also support new architectures beyond the well-known lambda architecture. However, the complexity of these distributed systems is in no way to be underestimated. Nevertheless, it is to be assumed that the spread of the systems as well as the functionality will continue to grow.

Links

[1] http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

[2] https://www.oreilly.com/ideas/questioning-the-lambda-architecture

[3] https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102

Was this post helpful?

LANGUAGE

Likes

Blog author

Matthias Niehoff

Head of Data

Do you still have questions? Just send me a message.

fromMatthias Niehoff

Zukunftssichere Observability mit OpenTelemetry

Observability, also die Möglichkeit, das Verhalten von Anwendungen in Echtzeit zu überwachen, Fehler schnell zu identifizieren und Probleme proaktiv anzugehen, ist ein unverzichtbares Element für erfolgreiche digitale Unternehmen. OpenTelemetry ist eine...

Observability

16.6.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Experience: Jetzt auch für APIs

APIs spielen eine zentrale Rolle bei der Digitalisierung. Extern angeboten, ermöglichen sie das Erschaffen von Ökosystemen und neuen Geschäftsmodellen. Unternehmen wollen gerne selbst als Plattform gesehen werden, auch hier sind APIs unerlässlich. Intern...

5.4.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Dass die Aufbau- und Ablauforganisation eines Unternehmens wichtig für eine schnelle und flexible IT ist, ist kein Geheimnis. Folglich gibt es eine Reihe von Ansätzen, die hier für Verbesserungen sorgen sollen: agile Ansätze, SAFe und alles, was es rund...

Agile Methoden
Agile

22.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Die Frage, was ein Nutzer in einer Anwendung darf, besteht oft aus komplexen Regeln und Konfigurationen, gespeichert in Datenbanken. Regelwerke werden in großen IT-Landschaften in verschiedenen Anwendungen häufig redundant implementiert, teils auch in...

8.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Schneller handeln bei Software-Schwachstellen

Sicherheitslücken in Software und Bibliotheken werden immer auftreten, unabhängig davon, wie viel Energie aufgebracht wird, um sie zu vermeiden. An die als Log4Shell bekannte Schwachstelle vor gut einem Jahr werden sich Viele noch schmerzhaft erinnern...

IT-Security

8.2.2023 | 3 Minuten Lesezeit

Matthias Niehoff

Ist die Cloud der große Umweltsünder?

Rechenleistung und Speicher kosten nicht nur Geld. Sie verbrauchen auch Mengen – potenziell klimaschädlicher – Energie. Das überrascht die Wenigsten, im kollektiven Bewusstsein ist es aber bislang kaum angekommen. Sehr wohl bewusst ist es natürlich ...

Cloud

18.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

WebAssembly – Mehr als nur ein Web-Standard

Seit 2017 unterstützen moderne Browser bereits WebAssembly (Wasm), seitdem ist der Hype mal größer, mal kleiner. Aber was ist WebAssembly überhaupt und warum wurde es geschaffen? WebAssembly ist ein standardisierter Bytecode, der in einer leichtgewichtigen...

Programmiersprache
Webdevelopment

4.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur. ...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Platform Engineering – Machen das nicht alle schon?

Plattformen sind aktuell ein sehr populäres Konzept, insbesondere in der Softwareentwicklung von Unternehmen. Viele sagen aber auch: So neu ist das doch gar nicht. Wir bieten unseren Entwicklern seit Jahren alle relevanten Tools und Werkzeuge, damit ...

DevOps
Accelerate

7.12.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Data Governance: Wie können wir Daten demokratisieren?

“Data is the new oil” ist inzwischen ein alter Hut. Jedes Unternehmen versucht, Daten besser zu nutzen, sei es, um die eigenen Prozesse zu optimieren, die Kunden besser zu verstehen oder neue Produkte anzubieten. Dabei stellen fast alle fest: Wir haben...

Data Science

23.11.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Machine Learning und künstliche Intelligenz sind aktuell in aller Munde und versprechen vielfältige Einsatzmöglichkeiten im Unternehmen. Trotzdem tun sich viele Unternehmen aktuell noch schwer, das Potential der Technologie zu nutzen. „Der Fokus liegt...

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Lookup additional data in Spark Streaming

When processing streaming data, the raw data from the events are often not sufficient. Additional data must be added in most cases, for example metadata for a sensor, of which only the ID is sent in the event. In this blog post I would like to discuss...

Software architecture
Scala
Big Data
Data
Streaming

1.6.2017 | 7 Minuten Lesezeit

Matthias Niehoff

Event time processing in Apache Spark and Apache Flink

With the new release of Spark 2.1, the event-time capabilities of Spark Structured Streaming have been expanded. It is time to take a closer look at the state of support and compare it with Apache Flink – which comes with a broad support for event time...

Big Data
Data
Machine Learning
Streaming

19.4.2017 | 9 Minuten Lesezeit

Matthias Niehoff

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

OpenAPI direkt in VS Code schreiben – geht das?

OpenAPI-Spezifikationen (OAS) beschreiben standardisiert und Programmiersprachen-unabhängig HTTP-APIs. Für die Erstellung von OAS gibt es verschiedene Möglichkeiten, häufig werden sie auch generiert. Das ist aber nicht für alle Programmiersprachen und...

API
Open Source

28.3.2024 | 7 Minuten Lesezeit

Mirabell Büscher

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Wie als Software-Entwickler sichtbar werden?

Egal ob Junior, Medior oder Senior, introvertiert oder extrovertiert: Jeder Software-Entwickler kann seine Sichtbarkeit mit unterschiedlichen Werkzeugen erhöhen und sollte dem Thema eine gewisse Bedeutsamkeit beimessen. Die Frage dabei ist nur: wie und...

Weiterbildung
Softwareentwicklung
Community
Open Source

21.2.2024 | 6 Minuten Lesezeit

Edgar Klepek

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Datenbanken testen mit Testcontainers in Mule4

Hier erfährst du die Möglichkeiten Testcontainers in Mule4 zu nutzen, um deine Datenbankaufrufe zu testen. Vor einiger Zeit hat mein Kollege Christian Langmann eine Blogartikelserie veröffentlicht, in welcher er aufzeigt, wie man in Mule3 Munit-Tests...

Community
Softwareentwicklung
Testing
API
Open Source
Datenbank
Container
Integration

19.1.2024 | 3 Minuten Lesezeit

Benjamin Lüdicke

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Erfahre, wie du mit dem Tool Mule Flow Landscape den Überblick über alle Mule Flows und deren Abhängigkeiten behältst. Die Integrationsplattform Mule ermöglicht es uns, Integrationen mittels einer Low-Code-Entwicklungsplattform umzusetzen. Die Bausteine...

Softwareentwicklung
API
Open Source
Dokumentation
Integration

13.8.2023 | 3 Minuten Lesezeit

Benjamin Lüdicke

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Bessere SQL-Datenpipelines mit dbt

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Im Rahmen eines kleinen Projekts, bei dem es um das Thema Berechnung von Flugrouten ging, brauchten wir eine einfache und leichtgewichtige Möglichkeit, die Route und andere Bereiche auf der Karte zu visualisieren. Bei der Suche nach einem passenden ...

JavaScript
Framework
Open Source

28.11.2022 | 14 Minuten Lesezeit

Danny Steinbrecher

Tastaturen selbst bauen

Wir verwenden sie alle jeden Tag: die Tastatur. Aber woraus besteht die Tastatur eigentlich? Wie baue ich eine Tastatur? Und wer legt fest, wie ich an das @ komme? All das haben wir bei einem Ausflug auf Texel erfahren und selbst Tastaturen gebaut. Dieser...

Raspberry Pi
Open Source

31.10.2022 | 6 Minuten Lesezeit

Robert Meißner

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Stream Processing mit Kafka Streams und Spring Boot

Kontinuierliche Datenströme in verteilten Systemen ohne Zeitverzögerung zu verarbeiten, birgt einige Herausforderungen. Wir zeigen euch, wie Stream Processing mit Kafka Streams und Spring Boot gelingen kann. Alles im Fluss: Betrachtet man Daten als fortlaufenden...

Softwarearchitektur
Cloud
IoT
Messaging
Kotlin
Spring

20.12.2021 | 20 Minuten Lesezeit

Maik Fleuter

Lukas Maier

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Distributed Stream Processing Frameworks for Fast & Big Data

Why Stream Processing?

Time Matters

State & Window

API & Runtime Environment

Distributed systems are unreliable!

Isn’t there something that can help us?

And what suits me?

Conclusion

Links

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Zukunftssichere Observability mit OpenTelemetry

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Experience: Jetzt auch für APIs

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Bessere SQL-Datenpipelines mit dbt

Schneller handeln bei Software-Schwachstellen

Ist die Cloud der große Umweltsünder?

WebAssembly – Mehr als nur ein Web-Standard

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Platform Engineering – Machen das nicht alle schon?

Data Governance: Wie können wir Daten demokratisieren?

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Lookup additional data in Spark Streaming

Event time processing in Apache Spark and Apache Flink

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

OpenAPI direkt in VS Code schreiben – geht das?

Green Cloud: Daten und Emissionen sparen

Wie als Software-Entwickler sichtbar werden?

Charge your APIs Volume 23: REST vs. gRPC

Datenbanken testen mit Testcontainers in Mule4

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mule Flow Landscape: Abhängigkeiten zwischen Mule Flows sichtbar machen

Große Sprachmodelle: Was ist ein LLM?

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Bessere SQL-Datenpipelines mit dbt

P5.JS: Zeichnen mit der Open-Source-JavaScript-Bibliothek

Tastaturen selbst bauen

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Stream Processing mit Kafka Streams und Spring Boot

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten