Crossing the Streams – Joins in Apache Kafka

15.2.2017 | 14 minutes of reading time

Version 0.10.0 of the popular distributed streaming platform Apache Kafka saw the introduction of Kafka Streams. In its initial release, the Streams-API enabled stateful and stateless Kafka-to-Kafka message processing using concepts such as map, flatMap, filter or groupBy that many developers are familiar with these days. In Kafka 0.10.1, it started to support “Interactive Queries”, an API that allows querying stateful stream transformations without going through another Kafka topic.

In this article, we will talk about a specific kind of streaming operation – the joining of streams. We will begin with a brief walkthrough of some core concepts. Then we will take a look at the kinds of joins that the Streams API permits. Following that, we’ll walk through each possible join by looking at the output of an established example. At the end, you should be aware of what kinds of joins are possible in Kafka Streams and where the caveats lie.

A brief introduction to some core concepts

The central component of Kafka is a distributed pub-sub message broker where producers send messages – key-value pairs – to topics which in turn are polled and read by consumers. Each topic is partitioned and the partitions are distributed among brokers. The excellent Kafka documentation explains it best.

There are two main abstractions in the Streams API. A KStream is a stream of key-value pairs – basically a close model of a Kafka topic. The records in a KStream either come directly from a topic or have gone through some kind of transformation – it has for example a filter-method that takes a predicate and returns another KStream that only contains those elements that satisfy the predicate. KStreams are stateless, but they allow for aggregation by turning them into the other core abstraction – KTable – which is often describe as “changelog stream”.

A KTable statefully holds the latest value for a given message key and reacts automatically to newly incoming messages.
A nice example is perhaps counting visits to a website by unique IP addresses. Let’s assume we have a Kafka topic containing messages of the following type: (key=IP, value=timestamp). A KStream contains all visits by all IPs, even if the IP is recurring. A count on such a KStream sums up all visits to a site including duplicates. A KTable on the other hand only contains the latest message and a count on the KTable represents the number of distinct IP addresses that visited the site.

KTables and KStreams can also be windowed. Regarding the example, this means we could add a time dimensions to our stateful operations. To enable windowing, Kafka 0.10 changed the Kafka message format to include a timestamp. This timestamp can either be CreateTime or AppendTime. CreateTime is set by the producer and can be be set manually or automatically. AppendTime is the time a message is appended to the log by the broker. Next up: joins.

Joins

Taking a leaf out of SQLs book, Kafka Streams supports three kinds of joins:

Inner Joins
- Emits an output when both input sources have records with the same key.
Outer Joins
- Emits an output for each record in either input source. If only one source contains a key, the other is null
Left Joins
- Emits an output for each record in the left or primary input source. If the other source does not have a value for a given key, it is set to null

Another important aspect to consider are the input types. The following table shows which operations are permitted between KStreams and KTables:

Primary Type	Secondary Type	Inner Join	Outer Join	Left Join
KStream	KStream	Allowed	Allowed	Allowed
KTable	KTable	Allowed	Allowed	Allowed
KStream	KTable	Not Allowed (Coming 0.10.2)	Allowed	Not Allowed

As the table shows, all joins are permitted between equal types. The only permitted inter-type join is a left join between a KStream and a KTable. A possible use case for this join is the matching of incoming streaming data to a KTable that contains some kind of master data, like matching an incoming user id to a more detailed profile.

In total there are seven possible join types. Let’s look at them in detail. Disclaimer: the table of possible joins and the join semantics are only valid for Kafka 0.10.0 and 0.10.1. They will be improved in Kafka 0.10.2 (see KIP-77 ).

Example

We are going to use a consistent example to demonstrate the differences in the joins. It is based on the online advertising domain. There is one Kafka topic that contains view events of particular ads and another one that contains click events based on those ads. Views and click share an ID that serves as the key in both topics.

In the examples, custom set event times provide a convenient way to simulate the timing within the streams. We will look at the following 7 scenarios with ids 0 to 6:

a click event arrives 1000 ms after the view
a click event arrives 10,000 ms after the view
a view event arrives 1000 ms after the view
there is a view event but no click
there is click event but no view event
there are two view events at distinct times, and a click event 1000 ms after the first view. The view events are denoted as 5.1 and 5.2
there is a view event followed by a click event after 500 ms and another one after 1200 ms. The clicks are denoted as 6.1 and 6.2 in the following.

This visualization shows these streams:

Inner Join KStream-KStream

All KStream-KStream joins are windowed, so the developer has to specify how long that window should be and if the relative order of the elements of both streams matters (ie, happens before/after semantics). The rationale behind that forced windowing is this: a KStream is stateless. To execute a join with acceptable performance, some internal state needs to be kept – otherwise the whole streams would need to be scanned each time a new element arrives. That state contains all elements of the stream within the time window (possible duplicates included). We will use a window of 5000 milliseconds in the following examples.

An inner join on two streams yields a result if a key appears in the both streams within the window. Applied to the example, this produces the following results:

Scenarios 0 and 2 appear as expected as the key appears in both streams within 5000 ms, even though they come in different order. Scenario 1 is missing as view and click did not appear within the window. Scenarios 3 and 4 are missing because they do not appear in both streams. Scenarios 5 and 6 appear duplicated as the keys appear twice in the view stream for scenario 5 and in the click stream for scenario 6.

A variation of this the enforcement of ordering. The developer can specify that a click event can only be joined if it occurs after a view event. This setting would lead to the elimination of scenario 2 in the example.

Outer Join KStream-KStream

The following will assume the idealistic ordering that each event is processed in the order of its timestamp. In practice, this is not always guaranteed.

An outer join will emit an output each time an event is processed in either stream. If the window state already contains an element with the same key in the other stream, it will apply the join method to both elements. If not, it will only apply the incoming element.

The following shows the idealistic result:

For scenario 0, an event is emitted once the view is processed. There is not click yet. When the click arrives, the joined event on view and click is emitted. For case 1, we also get two output events. However, since the events do not occur within the window, neither of these events contains both view and click. Scenario 3 appears in the output without a click, and the equivalent output is emitted for scenario 4. Scenario 5 produces 4 output events as there are two views that are emitted immediately and once again when they are joined against a click. Case 6 only produces 3 events as both clicks can be immediately joined against a view that arrived earlier.

What happens in real life? Hard to say because the timing is important. In tests, the clicks for scenario 0 were always processed earlier than the views by the streaming application, despite using CreateTime for the message timestamps. This is probably what you should take away from this paragraph: as of Kafka 0.10.1.x, outer joins work on processing time, not message time.

EDIT: The struck sentence is incorrect. Kafka Streams definitely do not work on processing time but event time. However, there are runtime dependencies and processing order matters. Thanks to Matthias Sax from Confluent for pointing that out.

Left Join KStream-KStream

This type of join is processing event-time dependent as well and has some behaviour that you might not expect due to a runtime dependency on processing order.
The left join emits an output event every time an event arrives in the left stream. If an event with the same key has previously arrived in the right stream, it is joined with the one in the primary stream. Otherwise it is set to null. With our examples, that is going to result in a bit of a sad picture as we only have one event where the click arrives before the view and thus has already been processed. This leads to the following result:

Only scenario 2 yields a complete result in the idealistic version, but as expected from a left join, all elements from the left stream show up. This is one of the semantics that will change with Kafka 0.10.2 where “right” messages arriving within the window will cause the emission of a joined event.

Inner Join KTable-KTable

Now we’re switching from KStreams to KTables. KTables are represented by materializing the incoming data into a local state store, building a table that always contains the latest update for a key. When a new record arrives, it is joined with the other table’s state store. Joins on KTables are not windowed as KTables describe a state, not a stream. The emitted events show state changes within the table.

It is not easy to figure out what happens here exactly. The following chart represents observations – please correct me if I’m wrong.
EDIT: I have gained a better insight in KTables. The behaviour that tripped me up is caused by caching. It works like this: by default, KTables use a cache for more efficient data processing. Results are only emitted once either the cache is full or a commit interval (default: 30 seconds) has been reached. For a better understanding of the join semantics, disabling caching is very helpful. The following chart shows results for both scenarios:

All the inner join pairs are emitted in both scenarios. Since we’re no longer windowed, even scenario 1 is represented. Empty brackets represent a notification that the state for the keys those scenarios has been updated, but as they could not be joined, the new state is empty/null. With caching enabled all data has been processed before a cache flush happens, so we’re missing events 5.1 and 6.1 from the join output. Another curious thing is that events are duplicated. According to KAFKA-4609 this is caused by the fact that both source tables are flushed individually and therefore both trigger the join. This seems to fit as the first set of emissions contains the empty update for scenario 3 whereas the second set contains scenario 4.

With caching disabled, cache semantics become a lot clearer. Very interesting are scenarios 5 and 6 where the stateful nature of KTables can be observed – view 5.1 is missing as it is updated in the source table by 5.2 wheres we get two joined output for scenario 6 because while click 6.2 updates 6.1, the runtime was able to match 6.1 against view 6 before it was update.

Outer Join KTable-KTable

Apart from the duplication that occurs in the outer join as well when enabling caching, it behaves pretty much as expected:

For the version with caching, the results are the same as with the inner join with the addition of proper data for scenarios 3 and 4 which for scenario 3 contains the lone view and for scenario 4 the lone click.

The none-caching variant emits an event every time a new message arrives and executes the join if a matching entry is held on the other table.

Left Join KTable-KTable

Left joins exhibit the same duplication behaviour. Apart from that, they also work the way you’d expect by now:

Using Caching we get a click element for case 3 and an empty/null event for scenario 4. The rest of the data is joined completely.
Disabling caching, we can see the semantics much more clearly. They behave as you’d expect a left join to behave.

This concludes the KTable-KTable section. The big lesson learned here is that the settings for caching play a very big role in the emission of events from a joined KTable. Yet the end result is the same – each joined KTable has the same content after completely processing the sample data, cached or non-cached. But the way we’re getting there is different.

Left Join KStream-KTable

As of Kafka <= 0.10.1.x,="" this="" is="" the="" only="" type="" of="" stream="" table="" join.="" Its="" semantics="" imply="" that="" an="" incoming="" event="" in="" a="" can="" be="" joined="" against="" table.="" The="" output="" operation="" another="" (and="" not="" table!).="" highly="" timing="" dependent="" –="" example="" with="" views="" and="" clicks="" does="" really="" work="" well="" as="" you’d="" probably="" use="" stream-table="" join="" for="" click="" adserving="" usually="" arrives="" after="" view.="" But="" any="" case,="" please="" consider="" contrieved="" example:=""

We’re simulating the output of an incoming stream joined against a table of clicks that initially is populated with events 0, 2, 4 and 6.1. Each incoming stream event is joined against that table and the result either contains both view and click or only the view if there is no click to join against. Between views 5.1 and 5.2, click event 5 is registered. While view 5.1 could not be joined against a click, this update means that view 5.2 can be joined. There is no issue with duplication as the result is a stream and not a table.

Partitioning and Parallelization

If you are familiar with Kafka consumers, you are probably aware of the concept of the consumer groups – Kafka consumption is parallelized by assigning partitions to exactly one consumer in a group of consumers that share the same group id. If you’re using defaults, Kafka itself will handle the distribution and assign the partitions to consumers. This happens in a way that the consumers have little influence over.
With simple consumers, this is quite straightforward. However, what does it mean for a Kafka stream with state and joins? Can partitions still be randomly distributed? No, they cannot. While you can run multiple instances of your streaming application and partitions will be distributed among them, there are requirements that will be checked at startup. Topics that are joined need to be copartitioned. That means that they need to have the same number of partitions. The streaming application will fail if this is not the case. Producers to the topic also need to use the same partitioner although that is something that cannot be verified by the streaming application as the partitioner is a property of the producer. For example, you will not get any join results if you send view event 0 to partition 0 and the corresponding click event to partition 1 even if both partitions are handled by the same instance of the streaming application.

Summary and Outlook

Kafka Streams is a very interesting API that can handle quite a few use cases in a scalable way. However, some join semantics are a bit weird and might be surprising to developers. An example of this is left and outer join on streams depending on the processing time of the events instead of the event time. Some of those issues are addressed by KIP-77 and are scheduled to be released with Kafka 0.10.2.
The duplicates produced by KTable joins look weird, but they can provide a lot of benefit when used with the “Interactive Query” feature introduced with Kafka 0.10.1 where state stores can be directly queried.

Kafka’s journey from Pub/Sub broker to distributed streaming platform is well underway and times are very exciting.

References

The official Streams documentation
A GitHub repository with examples for all joins
Kafka Wiki describing current and future join semantics
Confluent’s general streams documentation
First-hand experience with Kafka as part of the SMACK stack

Was this post helpful?

Likes

Blog author

Florian Troßbach

Senior IT Consultant

Do you still have questions? Just send me a message.

fromFlorian Troßbach

Validating Topic Configurations in Apache Kafka

Messages in Apache Kafka are appended to (partitions of) a topic. Topics have a partition count, a replication factor and various other configuration values. Why do those matter and what could possibly go wrong? Why does Kafka topic configuration matter...

Messaging
Big Data

7.12.2017 | 8 Minuten Lesezeit

Florian Troßbach

Building a distributed Runtime for Interactive Queries in Apache Kafka...

Interactive Queries are a fairly new feature of Apache Kafka Streams that provides programmatic access to the internal state held by a streaming application. However, the Kafka API only provides access to the state that is held locally by an instance...

Messaging
Java

20.3.2017 | 9 Minuten Lesezeit

Florian Troßbach

Interactive Queries in Apache Kafka Streams

"Databases? Where we're going we don't need databases" – Doc Brown, 1985 Well, we’re certainly not there yet, but this article is going to introduce you to a new feature of the popular streaming platform Apache Kafka that can make a dedicated external...

Messaging
Streaming

13.3.2017 | 10 Minuten Lesezeit

Florian Troßbach

Realtime Fast Data Analytics with Druid

I have been working with the SMACK stack for a while now and it is great fun from a developer’s point of view. Kafka is a very robust data buffer, Spark is great at streaming all that buffered data and Cassandra is really fast at writing and retrieving...

18.8.2016 | 13 Minuten Lesezeit

Florian Troßbach

Neues in Apache Kafka 0.10 und Confluent Platform 3.0.0

Die im Mai erschienenen neuen Versionen von Apache Kafka und Confluent Platform enthalten einige spannende Neuerungen. Diese werden in diesem Artikel vorgestellt. Was ist Apache Kafka? Kafka ist ein verteilter Message Broker, der nach dem Publish-Subscribe...

7.6.2016 | 10 Minuten Lesezeit

Florian Troßbach

The SMACK stack – hands on!

The SMACK stack is all the rage these days. Instead of just talking about it, this post is going to guide you through the steps for setting up a simple SMACK stack that will enable you to get a hands on experience with the tools. In the first step,...

1.5.2016 | 9 Minuten Lesezeit

Florian Troßbach

First steps with Java 9 and Project Jigsaw – Part 2

This is part 2 of a series that aims to get you started with project Jigsaw. In part 1 , we briefly talked about the definition of a module and how the Java Runtime was modularized. We then proceeded to a simple example that demonstrated how to (and ...

Java

1.12.2015 | 12 Minuten Lesezeit

Florian Troßbach

First steps with Java 9 and Project Jigsaw – Part 1

Eight years after its inception, Project Jigsaw – the modularization of the Java platform and introduction of a general module system – is on track to be included in Java 9. The target release has changed over the years from Java 7 via Java 8 to Java...

Java

24.11.2015 | 11 Minuten Lesezeit

Florian Troßbach

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Im Bereich des maschinellen Lernens wurde eine lange Zeit angenommen, dass die Eingabedaten von Modellen und Gewichten sicher sei und nicht extrahiert werden könnten. In den letzten Jahren veröffentlichte Forschung hat diese Annahme in Frage gestellt...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 8 Minuten Lesezeit

Ihsan Kisi

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Mithilfe von Daten können Unternehmen fundiertere Entscheidungen treffen, ihre Arbeitsabläufe optimieren und mit der Kraft des maschinellen Lernens (ML) einen Vorteil in der wettbewerbsintensiven Geschäftswelt erlangen. Allerdings ist der Umgang mit ...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 7 Minuten Lesezeit

Ihsan Kisi

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Wenn wir Erkenntnisse aus großen Datenmengen gewinnen wollen, bieten uns Cloud Service Provider inzwischen Lösungen an, dank derer wir uns kein Data Warehouse oder Hadoop-Cluster mehr in den Keller stellen müssen. AWS hat mit Athena, RedShift und EMR...

Cloud
Big Data
AWS
Serverless
GitLab

21.3.2023 | 16 Minuten Lesezeit

Maik Fleuter

Streaming Wikipedia mit Apache Kafka

Apache Kafka ist in aller Munde und entwickelt sich im Kontext von verteilten Systemen zum De-facto-Standard als Plattform für Event Streaming. Im Rahmen unserer OffProject Time (Weiterbildungszeit) haben wir uns die Plattform auch näher angeschaut und...

Kotlin
Data
Java
Messaging
Spring

15.8.2022 | 10 Minuten Lesezeit

Christoph Metzger

Felix Rieß

Stream Processing mit Kafka Streams und Spring Boot

Kontinuierliche Datenströme in verteilten Systemen ohne Zeitverzögerung zu verarbeiten, birgt einige Herausforderungen. Wir zeigen euch, wie Stream Processing mit Kafka Streams und Spring Boot gelingen kann. Alles im Fluss: Betrachtet man Daten als fortlaufenden...

Softwarearchitektur
Cloud
IoT
Messaging
Kotlin
Spring

20.12.2021 | 20 Minuten Lesezeit

Maik Fleuter

Lukas Maier

Mule: Streaming mit DataWeave

Mule legt den Datentyp für die Payload einer Nachricht nicht fest. Genauer als Object will es das Maultier nicht wissen. Häufig sind es PoJos, XML oder JSON. Da die letzten beiden nur strukturierter Text sind, müssen sie irgendwie abgelegt werden. Das...

Data
Integration
Streaming

9.9.2018 | 8 Minuten Lesezeit

Roger Butenuth

Simple Deep Learning mit Amazon SageMaker

In unserem neuesten codecentric.AI-Video geben wir eine kurze Einführung in Amazon SageMaker und zeigen, wie man damit schnell und einfach ein Bildklassifikationsmodell trainieren kann, das Brillenträger von Nicht-Brillenträgern unterscheidet. Mit ...

Big Data
AWS
Cloud
Data
Machine Learning
Künstliche Intelligenz
Python

11.7.2018 | 5 Minuten Lesezeit

Shirin Elsinghorst

Oliver Moser

Schema First Design – Produktentwicklung mit GraphQL

Zu den schwierigsten Aufgaben bei der Entwicklung neuer Produkte gehören die Koordinierung der Teams, der Featureumfang und unbekannte Faktoren in Form der „moving parts“. Laut Definition müssen wir bestimmte laufende Prozesse berücksichtigen. Ein gutes...

API
Big Data

25.6.2018 | 7 Minuten Lesezeit

Toni Haupt

Wie schreibt man eine Kotlin-DSL – z.B. für Apache Kafka?

Das Interesse an der Programmiersprache Kotlin wächst, und auch die Verwendung von Kotlin in Projekten nimmt zu. Ein Bereich, in dem Kotlin hervorragend verwendet werden kann, ist die Implementierung von speziellen Domänen-spezifischen Sprachen, den ...

Messaging
DSL
Kotlin

23.6.2018 | 9 Minuten Lesezeit

Peter-Josef Meisch

Deep Learning Workshop bei der codecentric AG in Solingen

Big Data – ein Schlagwort, das zur Zeit in aller Munde ist, egal ob in nerdigen Blogs, wissenschaftlichen Artikeln oder der Tageszeitung. Doch wie funktionieren Analysen von Big Data eigentlich? Um das heraus zu finden, habe ich an dem Workshop über ...

Big Data
Data
Künstliche Intelligenz
Machine Learning

6.2.2018 | 6 Minuten Lesezeit

Shirin Elsinghorst

BigchainDB – Das leichtgewichtige Blockchain-Framework [blockcentric #...

Mit BigchainDB sehen wir eines der ersten vollumfänglichen, aber einfachen Blockchain-Frameworks. Das Projekt macht es sich zur Aufgabe, Blockchain für eine große Anzahl von Entwicklern und Use Cases nutzbar zu machen, ohne besonderes Wissen in Kryptographie...

Big Data
Blockchain

3.1.2018 | 5 Minuten Lesezeit

Jonas Verhoelen

Data Science und Big Data: Eine Mate mit… Michael Plümacher #EineMateMit

„Aufgrund der gestiegenen Rechen- und Speicherkapazitäten sind in den letzten Jahren ganz neue Möglichkeiten entstanden“, sagt Michael Plümacher, Data Scientist bei der codecentric. Einige seiner aktuellen Data-Science- und Big Data-Projekte stellt er...

Big Data
Data
Community

21.9.2017 | 1 Minuten Lesezeit

Felix Braun

Fraud-Analyse mit Data-Science-Techniken

Was ist Fraud und was macht es für Data Science interessant?Im Zusammenhang mit Data Science beschreibt das englische Wort „Fraud“ in der Regel Betrug im Online-, Kreditkarten- oder Versicherungsgeschäft. Betrugsversuche bei Geschäftsabschlüssen gibt...

Big Data
Data
Machine Learning

5.9.2017 | 9 Minuten Lesezeit

Shirin Elsinghorst

Datenlookup in Spark Streaming

Bei der Verarbeitung von Streaming-Daten reichen die Rohdaten aus den Events häufig nicht aus. Meist müssen noch zusätzliche Daten hinzugezogen werden, beispielsweise Metadaten zu einem Sensor, von dem im Event nur die ID mitgeschickt wird.In diesem ...

Softwarearchitektur
Scala
Big Data
Data
Streaming

1.6.2017 | 7 Minuten Lesezeit

Matthias Niehoff

Event-Zeit-Verarbeitung in Apache Spark und Apache Flink

Mit dem neuen Release von Spark 2.1 wurden die Eventzeit-Fähigkeiten von Spark Structured Streaming ausgebaut. Höchste Zeit also den Stand der Unterstützung genauer unter die Lupe zu nehmen und mit Apache Flink – ausgestattet mit einem breiten Support...

Big Data
Data
Machine Learning
Streaming

19.4.2017 | 9 Minuten Lesezeit

Matthias Niehoff

Verteilte Stream Processing Frameworks für Fast Data & Big Data – Ein ...

Spark Streaming, Flink, Storm, Kafka Streams – das sind nur die populärsten Vertreter einer stetig wachsenden Auswahl zur Verarbeitung von Streaming-Daten in großen Mengen. In diesem Artikel soll es um die wesentlichen Konzepte hinter diesen Frameworks...

Big Data
Data
Open Source
Messaging
Machine Learning
Streaming

26.3.2017 | 10 Minuten Lesezeit

Matthias Niehoff

IoT-Analyse-Plattform

Internet of Things (IoT) oder auch Industrie 4.0 ist heute in aller Munde. Aber welche Herausforderungen stellen sich eigentlich bei der Verarbeitung großer Datenmengen? Eine Variante kann sein, Daten zu sammeln und später im Batch-Betrieb zu verarbeiten...

Cloud
IoT
NoSQL
Scala
Big Data

13.7.2016 | 14 Minuten Lesezeit

Achim Nierbeck

Aufbau eines Mesosphere DC/OS-Clusters mit Terraform

Der Ein oder Andere kennt höchstwahrscheinlich die Herausforderung, ein verteiltes System zu betreiben. Selbst der Betrieb von einem einfachen Online-Shop kann eine nicht triviale Aufgabe sein, wenn der Shop in einer Microservice-Architektur über mehrere...

Cloud
CI/CD
DevOps
Softwarearchitektur
Reactive Programming
Messaging
Big Data

24.4.2016 | 5 Minuten Lesezeit

Bernd Zuther

Joins und Schema-Validierung mit MongoDB 3.2

Mit Version 3.2 der dokumentenorientierten NoSQL-Datenbank MongoDB werden u.a. zwei lange vermisste(?) Features eingeführt, auf die ich im Folgenden näher eingehen möchte.JoinsDie logischen Namensräume, in denen man seine Dokumente ablegt, werden in...

NoSQL
Big Data
Validierung

7.12.2015 | 3 Minuten Lesezeit

Tobias Trelle

Canary-Release mit der Very Awesome Microservices Platform (Vamp)

Im letzten Artikel der Serie “Microservice-Deployment ganz einfach ” erkläre ich, dass Docker nicht zwingend notwendig ist, um Microservice-Anwendungen auszuliefern. Wie der Artikel zeigt, kann man die Linux-Paketverwaltung benutzen, um Microservice...

Cloud
CI/CD
Infrastructure
Startup
Open Source
Big Data
Microservices
Kubernetes
Softwareentwicklung
API

11.10.2015 | 7 Minuten Lesezeit

Bernd Zuther

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Crossing the Streams – Joins in Apache Kafka

A brief introduction to some core concepts

Joins

Example

Inner Join KStream-KStream

Outer Join KStream-KStream

Left Join KStream-KStream

Inner Join KTable-KTable

Outer Join KTable-KTable

Left Join KTable-KTable

Left Join KStream-KTable

Partitioning and Parallelization

Summary and Outlook

References

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Validating Topic Configurations in Apache Kafka

Building a distributed Runtime for Interactive Queries in Apache Kafka...

Interactive Queries in Apache Kafka Streams

Realtime Fast Data Analytics with Druid

Neues in Apache Kafka 0.10 und Confluent Platform 3.0.0

The SMACK stack – hands on!

First steps with Java 9 and Project Jigsaw – Part 2

First steps with Java 9 and Project Jigsaw – Part 1

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

Eine Einführung in Federated Learning im industriellen Kontext: Fortgeschritten

Eine Einführung in Federated Learning im industriellen Kontext: Grundlagen

Datenanalyse auf die schnelle Art – mit Amazon Athena und GitLab

Streaming Wikipedia mit Apache Kafka

Stream Processing mit Kafka Streams und Spring Boot

Mule: Streaming mit DataWeave

Simple Deep Learning mit Amazon SageMaker

Schema First Design – Produktentwicklung mit GraphQL

Wie schreibt man eine Kotlin-DSL – z.B. für Apache Kafka?

Deep Learning Workshop bei der codecentric AG in Solingen

BigchainDB – Das leichtgewichtige Blockchain-Framework [blockcentric #...

Data Science und Big Data: Eine Mate mit… Michael Plümacher #EineMateMit

Fraud-Analyse mit Data-Science-Techniken

Datenlookup in Spark Streaming

Event-Zeit-Verarbeitung in Apache Spark und Apache Flink

Verteilte Stream Processing Frameworks für Fast Data & Big Data – Ein ...

IoT-Analyse-Plattform

Aufbau eines Mesosphere DC/OS-Clusters mit Terraform

Joins und Schema-Validierung mit MongoDB 3.2

Canary-Release mit der Very Awesome Microservices Platform (Vamp)

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten