LANGUAGE

Event time processing in Apache Spark and Apache Flink

19.4.2017 | 9 minutes of reading time

With the new release of Spark 2.1, the event-time capabilities of Spark Structured Streaming have been expanded. It is time to take a closer look at the state of support and compare it with Apache Flink – which comes with a broad support for event time processing. . In this article, I will describe how three basic solutions for event processing – watermarks, triggers and accumulators – work and then compare their implementation in Spark and Flink. Base of the comparison is Spark in version 2.1 and Flink in version 1.2.

For a broader comparison of the frameworks refer to Distributed Stream Processing Frameworks for Fast & Big Data . The linked article also describes further basics, which are presupposed for this comparison, for example stafeful & window stream processing as well as the difference between event and processing time.

The central problems of event processing are unsorted as well as delayed events. Events are not observed by the system exactly at the time they occur. The difference between the event time and the processing time is not constant. Especially in the case of temporal evaluations, this increases the complexity of the system and thus also the coding. If, for example, all events between 12 and 13 o’clock are to be processed (a typical window operation), multiple question arise: How to deal with late events? How long to wait for delayed entries? How to update the result in case of delayed entries? Google has presented various solutions for these problems with the Dataflow API, which has now been released in Apache Beam. Both Spark and Flink do not explicitly follow the Dataflow API, but the concepts are similar and therefore can be compared.

The concepts used in the Dataflow API are one way to tackle event-time processing problems. Kafka Streams for example takes other strategies for event-time support. More on this hopefully in a following blog post.

Watermarks

To determine up to which time events have already been processed, there are watermarks. These show – as with a water level – the so far reached “time level”. When a watermark is reached, the result of the calculation is materialized. For example, if the watermark is defined as the time of the latest event minus a fixed buffer of 30 seconds, it means “It is assumed that now all events have arrived until the time x”. x is in this case the watermark defined as time(newest event) – 30 seconds.

There are also heuristic watermarks where the buffer is dynamically adapted, for example based on empirical measurements. Thus the system could observe that events are always received at night with a clear delay. The buffer could be increased on this basis. Likewise, the expected delay could be extrapolated from the delay of the last hour. In addition, there is also the – mostly unrealistic – perfect watermark, where the event is processed directly when it occurs.

In the context of watermarks, the “allowed lateness” is often mentioned. For Spark, for example, this is the watermark buffer described above. In the Dataflow Model, however, allowed lateness is an additional period after the watermark, in which events are not ignored, but can subsequently influence the result. Therefore rules are defined – for example by means of triggers – how and how often events are to be processed that occur within allowed lateness.

Regardless of the interpretation of allowed lateness, all data for the calculation of the result must remain stored until the allowed lateness has elapsed in order to update the results. The data is then automatically deleted.

Trigger

While watermarks display the current state of the data received, triggers materialize the calculation. The watermark trigger is the easiest way to do this: it is fired as soon as the watermark reaches the end of the window – for example, 1:00 pm with a window from 12 am to 1 pm. With the watermark defined above with a fixed buffer of 30 seconds, this will be the case at 13:00:30.

Other triggers are possible:

On the processing time, for example, every 2 minutes
On the event number, for example every 100 events
At special events, for example the end of a file, a technical flush event or based on the content of the event.
A combination of the three above

Triggers are typically used to determine intermediate results before the watermark is reached, or to update the results for delayed events. It would be possible, for example, with a window of 10 minutes to trigger every minute, then again at the watermark and finally at each delayed event until the allowed lateness is reached.

Accumulation

If the result is calculated more than once by the use of triggers, the developer must define how the individual partial results are handled. There are three variants:

Discarding: With each trigger, only the new partial results are passed on and the results obtained up to this point are subsequently deleted.
Accumulating: The current results are updated and passed on each trigger. The results are not deleted.
Accumulating & Retracting: Like accumulating with the additional information about the previous result so that subsequent operations can easily correct their results. This mode is provided in the DataFlow model, but is not yet implemented by any framework.

The choice of the appropriate accumulation strategy depends strongly on the subsequent processing and the final sink. If this is able to update results, the accumulating mode can be used. If, however, every partial result must only serve once as input for the next step, the discarding mode must be used.

Support in Apache Spark Streaming

Spark does not need a special mode for event processing. Internally, nothing is different from the processing time. To define a window for the event time in Spark, you must first group it by the window

1val words = ... // streaming DataFrame with schema { timestamp: Timestamp, word: String }
2 
3// Group by window and word, calculate the count for each group
4val windowedCounts = words.groupBy(
5    window($"timestamp", "10 minutes", "5 minutes"),
6    $"word"
7).count()

This is not significantly different from a groupBy on a key, but with a time window as key. In this case it is a window of 10 minutes length with a sliding interval of 5 minutes. In addition, the entries can still be grouped by a non-technical key, in this case the “word”. With the count at the end you get a word count for a 10 minutes window.

In the example above, all data is stored indefinitely so that the result can be updated even in the case of delayed events. With watermarks this time can be limited:

1val windowedCounts = words
2   .withWatermark("timestamp", "10 minutes")
3   .groupBy(
4       window($"timestamp", "10 minutes", "5 minutes"),
5       $"word")
6   .count()

Now spark only waits 10 minutes for delayed data. The data for this window is then deleted. Currently only a watermark with a fixed allowed delay can be used. At Spark, the watermark is currently equal to the allowed lateness.

Spark currently implements two different modes to output the result:

In the case of the append mode, the result of the window is output after reaching the watermark, ie 10 minutes after the end of the time window. Early triggers, ie when reaching the end of the window, are not possible. The watermark adds an additional latency.
In the case of complete mode, the previously calculated result is output with every trigger. However, Spark currently only supports triggers based on the processing time. Time-independent or even composite triggers are currently not supported.

In conjunction with watermarks, only the append mode can be used, since all the existing data must be available for the complete mode and can not be deleted after reaching the watermark.

Spark thus offers a rudimentary support for watermarks (only fixed delay) and triggers (only on processing time). For the accumulations, the implemented output modes of Spark most likely correspond to the accumulating mode of the DataFlow API, since the complete result for a time window is always determined after expiration of the allowed delay.

Support in Apache Flink

It must first be indicated to Flink that the processing is to take place on the event time. This is done by

1env.setStreamTimeCharacteristic(TimeCharacteristic.EventTIme);

Depending on the time characteristic, for example, the different window implementations behave differently. In addition, the event time of the stream source is taken into account. If the source does not provide an event time, the event time must be manually extracted from the event using timestamp assigners. The watermark must also be defined for these events.

1stream.assignTimestampsAndWatermarks(new TimestampAndWatermarkAssigner());

The developer can completely implement the assigner himself or extend predefined implementations. In particular, different implementations are possible for watermarks:

With a fixed distance
With a dynamic but limited distance
Or based on specific events.

1class FixedWatermarkGenerator extends AssignerWithPeriodicWatermarks[SomeEvent] {
2 
3   override def extractTimestamp(element: SomeEvent, previousElementTimestamp: Long): Long = {
4       element.getEventTimestamp
5   }
6 
7   override def getCurrentWatermark(): Watermark = {
8       // the watermark is 10s behind the current time
9       new Watermark(System.currentTimeMillis() - 10000)
10   }
11}

The watermark is used to determine the time when most of data for a window was processed. At this time, a calculation is executed. In addition to the watermark, an allowed lateness can be specified. This is the period of time that an event may be delayed beyond the watermark. The allowed lateness is always defined in conjunction with a window operation..

1stream
2   .assignTimestampsAndWatermarks(new TimestampAndWatermarkAssigner());
3   .keyBy(event -> event.someKey)
4   .window((SlidingEventTimeWindows.of(Time.minutes(15), Time.minutes(5)))
5   .allowedLateness(Time.minutes(2))
6   .apply()

A sliding window with a length of 15 minutes, a sliding interval of 5 minutes and an allowed lateness of 2 minutes.

When it comes to calculating a window, Flink offers additional triggers besides the watermark. These triggers can react to a certain number (for example all 100 events), to the time (either event or processing time) or to a mixture of both.. It is also possible to dynamically register triggers, which are executed in the future, for example a certain time after an event.

Last but not least the question about the accumulators: By default the complete updated data is passed on to subsequent operations within the streaming application. Instead, a fire and purge trigger can be used, in which all current data is deleted after the trigger and all other triggers only pass on the new data. This is then effectively a discarding accumulator.

As a result, Flink offers extensive support during event processing with its various watermarks and flexible triggers as well as windows. Despite the flexibility, it is possible to implement standard cases without major effort.

Summary and recommendation

It is obvious that Flink has been working much longer on support for event processing. So it is to be explained that significantly more concepts are already supported. In addition, Flink continuously works on supporting the dataflow concepts, for example with the implementation of the retractable accumulator .

Spark, on the other hand, has only started to support event processing with Structured Streaming. So far the basic principles have been created and the first concepts have been implemented with version 2.1. It is to be expected that Spark will implement the essential functions in the course of the year. But if you need stream processing with event processing now, you should start with Flink.

Was this post helpful?

LANGUAGE

Likes

Blog author

Matthias Niehoff

Head of Data

Do you still have questions? Just send me a message.

fromMatthias Niehoff

Zukunftssichere Observability mit OpenTelemetry

Observability, also die Möglichkeit, das Verhalten von Anwendungen in Echtzeit zu überwachen, Fehler schnell zu identifizieren und Probleme proaktiv anzugehen, ist ein unverzichtbares Element für erfolgreiche digitale Unternehmen. OpenTelemetry ist eine...

Observability

16.6.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Crossplane ist ein plattformübergreifendes Kontrollsystem (Control-Plane), das das Management von Cloud-Ressourcen vereinfachen und automatisieren soll. Das Tool ermöglicht es, verschiedene Cloud-Provider und lokale Ressourcen, z. B. Kubernetes-Cluster...

Cloud
Cloud Native

12.5.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Experience: Jetzt auch für APIs

APIs spielen eine zentrale Rolle bei der Digitalisierung. Extern angeboten, ermöglichen sie das Erschaffen von Ökosystemen und neuen Geschäftsmodellen. Unternehmen wollen gerne selbst als Plattform gesehen werden, auch hier sind APIs unerlässlich. Intern...

5.4.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Dass die Aufbau- und Ablauforganisation eines Unternehmens wichtig für eine schnelle und flexible IT ist, ist kein Geheimnis. Folglich gibt es eine Reihe von Ansätzen, die hier für Verbesserungen sorgen sollen: agile Ansätze, SAFe und alles, was es rund...

Agile Methoden
Agile

22.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Die Frage, was ein Nutzer in einer Anwendung darf, besteht oft aus komplexen Regeln und Konfigurationen, gespeichert in Datenbanken. Regelwerke werden in großen IT-Landschaften in verschiedenen Anwendungen häufig redundant implementiert, teils auch in...

8.3.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Bessere SQL-Datenpipelines mit dbt

SQL ist weiterhin aus der Datenanalyse nicht wegzudenken – es ist vergleichsweise einfach zu lernen und Anwender können es ohne zusätzliche Werkzeuge auf einer Datenbank ausführen. Entsprechend ist es bei vielen Datenanalysten und Engineers beliebt. ...

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Schneller handeln bei Software-Schwachstellen

Sicherheitslücken in Software und Bibliotheken werden immer auftreten, unabhängig davon, wie viel Energie aufgebracht wird, um sie zu vermeiden. An die als Log4Shell bekannte Schwachstelle vor gut einem Jahr werden sich Viele noch schmerzhaft erinnern...

IT-Security

8.2.2023 | 3 Minuten Lesezeit

Matthias Niehoff

Ist die Cloud der große Umweltsünder?

Rechenleistung und Speicher kosten nicht nur Geld. Sie verbrauchen auch Mengen – potenziell klimaschädlicher – Energie. Das überrascht die Wenigsten, im kollektiven Bewusstsein ist es aber bislang kaum angekommen. Sehr wohl bewusst ist es natürlich ...

Cloud

18.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

WebAssembly – Mehr als nur ein Web-Standard

Seit 2017 unterstützen moderne Browser bereits WebAssembly (Wasm), seitdem ist der Hype mal größer, mal kleiner. Aber was ist WebAssembly überhaupt und warum wurde es geschaffen? WebAssembly ist ein standardisierter Bytecode, der in einer leichtgewichtigen...

Programmiersprache
Webdevelopment

4.1.2023 | 2 Minuten Lesezeit

Matthias Niehoff

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Infrastructure as Code (IaC) ist inzwischen ein alter Hut. Frameworks wie Terraform, Ansible und andere haben Standards geschaffen. Kaum jemand provisioniert produktive Systeme heute ohne IaC – sei es in der Cloud oder auf der eigenen Infrastruktur. ...

Infrastructure as Code
AWS
Cloud

21.12.2022 | 3 Minuten Lesezeit

Matthias Niehoff

Platform Engineering – Machen das nicht alle schon?

Plattformen sind aktuell ein sehr populäres Konzept, insbesondere in der Softwareentwicklung von Unternehmen. Viele sagen aber auch: So neu ist das doch gar nicht. Wir bieten unseren Entwicklern seit Jahren alle relevanten Tools und Werkzeuge, damit ...

DevOps
Accelerate

7.12.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Data Governance: Wie können wir Daten demokratisieren?

“Data is the new oil” ist inzwischen ein alter Hut. Jedes Unternehmen versucht, Daten besser zu nutzen, sei es, um die eigenen Prozesse zu optimieren, die Kunden besser zu verstehen oder neue Produkte anzubieten. Dabei stellen fast alle fest: Wir haben...

Data Science

23.11.2022 | 2 Minuten Lesezeit

Matthias Niehoff

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Machine Learning und künstliche Intelligenz sind aktuell in aller Munde und versprechen vielfältige Einsatzmöglichkeiten im Unternehmen. Trotzdem tun sich viele Unternehmen aktuell noch schwer, das Potential der Technologie zu nutzen. „Der Fokus liegt...

Künstliche Intelligenz
Data
Community
Machine Learning

27.5.2020 | 1 Minuten Lesezeit

Matthias Niehoff

Lookup additional data in Spark Streaming

When processing streaming data, the raw data from the events are often not sufficient. Additional data must be added in most cases, for example metadata for a sensor, of which only the ID is sent in the event. In this blog post I would like to discuss...

Software architecture
Scala
Big Data
Data
Streaming

1.6.2017 | 7 Minuten Lesezeit

Matthias Niehoff

Distributed Stream Processing Frameworks for Fast & Big Data

Spark Streaming, Flink, Storm, Kafka Streams – that are only the most popular candidates of an ever growing range of frameworks for processing streaming data at high scale. This article is about the main concepts behind these frameworks. Furthermore...

Big Data
Data
Open Source
Messaging
Machine Learning
Streaming

26.3.2017 | 10 Minuten Lesezeit

Matthias Niehoff

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Green Cloud: Daten und Emissionen sparen

Das Internet produziert jährlich 900 Millionen Tonnen CO₂ – das ist deutlich mehr als Deutschland insgesamt emittiert. Hauptverantwortlich ist der immer weiter steigende Stromverbrauch beim Transport und der Speicherung von Daten. Wenn ihr kurz darüber...

Cloud
Green IT
Softwarearchitektur
Data

11.3.2024 | 5 Minuten Lesezeit

Dennis

Charge your APIs Volume 23: REST vs. gRPC

APIs dienen als Verbindungsstück zwischen Daten und Verarbeitung und erlauben uns damit, Daten im richtigen Kontext als Informationen zu interpretieren. Passende fachliche Themen sind dabei präsenter denn je und erreichen bald auch den Endverbraucher...

Java
Softwareentwicklung
Spring
Softwarearchitektur
API
Data

11.2.2024 | 7 Minuten Lesezeit

Sebastian Tiemann

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Große Sprachmodelle: Was ist ein LLM?

Große Sprachmodelle (Large Language Models oder LLM) haben in den letzten Jahren enorme Fortschritte gemacht und spielen eine entscheidende Rolle in verschiedenen Anwendungen. Aber was ist ein LLM? Es ist sinnvoll zu erklären, was ein „einfaches“ Sprachmodell...

Machine Learning

20.6.2023 | 4 Minuten Lesezeit

Elvira Siegel

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Bessere SQL-Datenpipelines mit dbt

Data

22.2.2023 | 2 Minuten Lesezeit

Matthias Niehoff

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

In diesem Artikel möchte ich euch mit einem Python Jupyter Notebook zeigen, wie ihr Anwendungsfälle der Tourenoptimierung inklusive Nebenbedingungen lösen und visualisieren könnt. Außerdem zeige ich euch, wie ihr mit OpenStreetMaps die Route zwischen...

Data

21.6.2022 | 7 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

In diesem Artikel möchte ich euch zeigen, wie ihr Probleme der Tourenoptimierung in einem Python Jupyter Notebook lösen und visualisieren könnt. Am Beispiel eines Fahrradkurierdienst zeige ich außerdem, wie das Grundproblem um gängige Nebenbedingungen...

Data

16.6.2022 | 9 Minuten Lesezeit

Lukas Heidemann

Einführung in die Welt der Tourenoptimierung (1/3)

In vielen Unternehmen fallen täglich verschiedene Transportprozesse an. Klassische Beispiele sind die Optimierung von Warenein- und ausgängen, die Einsatzplanung von Servicetechnikern oder die optimale Reihenfolge der Auslieferung bei Lieferdiensten....

Data

12.6.2022 | 8 Minuten Lesezeit

Lukas Heidemann

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Die Corona-Krise ist weiterhin in aller Munde und wird uns mit hoher Wahrscheinlichkeit noch etwas länger begleiten. Wie man aus unterschiedlichen Statistiken erfährt, schwanken die Fallzahlen weiter und sorgen für zusätzliche Restriktionen. Diese werden...

Computer Vision
Künstliche Intelligenz
IoT
Machine Learning

13.12.2021 | 7 Minuten Lesezeit

Michel Ehmen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Die Qualität bzw. Nützlichkeit von Machine-Learning-Modellen lässt sich mit Hilfe von Testdaten und Metriken bewerten. Allerdings in welchem Umfang? Manuell, automatisiert, einmalig, regelmäßig? Manuell lassen sich die ersten Modelle als Ergebnis eines...

Data
Machine Learning
Softwareentwicklung
CI/CD

7.12.2021 | 7 Minuten Lesezeit

Berthold Schulte

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Machine Learning (ML) erzeugt erst dann realen Mehrwert, wenn es in Produktion benutzt wird. Allerdings kann die Zeitspanne zwischen der Entwicklung eines belastbaren Modells und dessen Einsatz frustrierend lange sein. Insbesondere in schnelllebigen ...

Agile Methoden
Cloud
Machine Learning

26.7.2021 | 5 Minuten Lesezeit

Timo Böhm

Niklas Haas

Schnelles Training eines Recommendation-Modells durch BigQuery ML

Machine Learning (ML) kann nur durch Modelle in der Produktion Business Value erzeugen. Allerdings kann die Zeitspanne zwischen der Entwicklung der nächsten Iteration eines Modells und dessen Einsatz in einer Produktionsumgebung massiv sein. Dies gilt...

Accelerate
Cloud
Data
Google Cloud
Machine Learning

26.7.2021 | 11 Minuten Lesezeit

Niklas Haas

Timo Böhm

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Heutzutage steht fast alles, was mit den Labels „künstliche Intelligenz (KI)“ oder „Machine Learning (ML)“ versehen ist, für Fortschritt. Seltsamerweise schließt diese Assoziation jedoch häufig die Themen Daten und Dateninfrastruktur nicht ausreichend...

Kultur
Data
Machine Learning

21.6.2021 | 12 Minuten Lesezeit

Marcel Mikl

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

Bei klassischen Machine-Learning-(ML-)Projekten beschäftigen sich Data Scientists häufig längere Zeit (mehrere Monate) mit der Entwicklung eines ML-Modells. Dabei werden hohe Kosten verursacht und die Zeit, bis ein erstes Modell zur Verfügung steht, ...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Google Cloud
Machine Learning

17.5.2021 | 5 Minuten Lesezeit

Nils Bauroth

Sven Rediske

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

Dieser Artikel begleitet meinen Vortrag The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren, den ich am 20.10.2020 auf der data2day gehalten habe.Datenvisualisierung ist ausschlaggebend für Verständnis und KommunikationDatenvisualisierung...

Data
Data Science

19.10.2020 | 11 Minuten Lesezeit

Shirin Elsinghorst

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

Noch vor kurzer Zeit mussten für den Einsatz von künstlicher Intelligenz (KI) unter großem Aufwand eigene KI-Modelle erstellt werden. Heute ist für viele Anwendungsfälle die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken...

Cloud
Computer Vision
Data
Künstliche Intelligenz
Machine Learning
Python

29.7.2020 | 11 Minuten Lesezeit

Marcel Mikl

Nico Axtmann

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Noch vor kurzer Zeit war der Einsatz von künstlicher Intelligenz (KI) nur mit großem Aufwand und Konstruktion eigener neuronaler Netze möglich. Heute ist die Einstiegshürde in die Welt der KI durch Cloud-Computing-Dienste stark gesunken. So kann man ...

Cloud
Computer Vision
Data
Python
Machine Learning
Google Cloud
Künstliche Intelligenz

8.7.2020 | 11 Minuten Lesezeit

Nico Axtmann

Marcel Mikl

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Event time processing in Apache Spark and Apache Flink

Watermarks

Trigger

Accumulation

Support in Apache Spark Streaming

Support in Apache Flink

Summary and recommendation

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Zukunftssichere Observability mit OpenTelemetry

Crossplane: Eine Lösung für hybride Cloud-Herausforderungen?

Experience: Jetzt auch für APIs

Team Topologies: Ein Gedankenmodell für leistungsstarke Teams

Wie Open Policy Agent Entwickler befähigt, Autorisierungen einfach umzusetzen

Bessere SQL-Datenpipelines mit dbt

Schneller handeln bei Software-Schwachstellen

Ist die Cloud der große Umweltsünder?

WebAssembly – Mehr als nur ein Web-Standard

AWS Cloud Development Kit – Infrastructure as Code on Steroids

Platform Engineering – Machen das nicht alle schon?

Data Governance: Wie können wir Daten demokratisieren?

Machine Learning in der Praxis. Eine Mate mit … Matthias Niehoff #EineMateMit

Lookup additional data in Spark Streaming

Distributed Stream Processing Frameworks for Fast & Big Data

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Green Cloud: Daten und Emissionen sparen

Charge your APIs Volume 23: REST vs. gRPC

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Große Sprachmodelle: Was ist ein LLM?

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Bessere SQL-Datenpipelines mit dbt

Streaming Wikipedia mit Apache Kafka

Einführung in die Welt der Tourenoptimierung – Echte Routen und realistischere...

Einführung in die Welt der Tourenoptimierung – Visualisierung und Lösungsverfahren...

Einführung in die Welt der Tourenoptimierung (1/3)

Smart DistancR – Perspektivisch korrekte Distanzmessung zwischen Personen

Machine-Learning-Modelle bewerten – Quality Gates etablieren

Kürzere Time-to-Market für ML-Modelle durch Googles BigQuery ML

Schnelles Training eines Recommendation-Modells durch BigQuery ML

KI, Daten und Infrastruktur – ML-Systeme schnell Ende-zu-Ende verproben...

Schnelles KI-Prototyping mit Google Cloud AutoML Vision

The Good, the Bad and the Ugly: Daten effektiv visualisieren und kommunizieren

KI in der Praxis: Fehlerhafte Bauteile mit Rekognition auf AWS identifizieren

KI in der Praxis: Fehlerhafte Bauteile mit AutoML in der Google Cloud ...

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten