Earlier this year a customer mentioned a search requirement that I hadn’t really thought about before: How to achieve transactions in Elasticsearch? Recently, the same requirement popped up again in a conversation I had with other search aficionados. Time to share some thoughts about transactions in Elasticsearch and how to implement them!
Before we start looking for solutions, we need to clarify what we are actually talking about as the term transaction is quite overloaded. In our scenario, there is a stream of documents to be indexed, some of which form logical groups. The transactional requirement is that, for each document group, no document of that group becomes searchable before any other document of the same group. Put differently, all documents in a group become visible for search at exactly the same point in time. Thus, we are talking about atomicity with respect to visibility for search, and I’m going to refer to that as a transaction. So how to achieve transactions with Elasticsearch?
Elasticsearch indexing operations
Let’s take a look at how Elasticsearch handles indexing operations so that we know what we can build on. Elasticsearch internally uses Lucene, so let’s check how the two work together.
With Lucene, documents to be indexed are first kept in an in-memory buffer where they are neither searchable nor persisted. A Lucene flush() operation, which is executed based on certain heuristics, turns the in-memory buffer into a segment. A Lucene commit() goes one step further, as apart from forcing a flush() it persists any in-memory segments on disk and makes them searchable. Because a commit() is rather heavyweight in terms of I/O, Lucene has a feature called near-real-time search (NRT) which is able to make in-memory segments searchable even before commit() is called.
Thus, with Lucene, a segment is the atomic unit with respect to search visibility, but only a commit() guarantees durability. If you are curious why Lucene uses segments in the first place and how this approach benefits search performance, I highly recommend reading Lucene in Action, Second Edition, Manning 2010.
When Elasticsearch receives an indexing request via the REST API, it needs to persist the document so that it can send the client an acknowledgement of safe reception. As it would be way too costly to execute a Lucene commit() for each received document, Elasticsearch uses its own persistence mechanism called the transaction log. The transaction log supports two major operations which eventually map to the Lucene operations introduced above. The first operation, the Elasticsearch refresh(), turns transaction log contents into a segment and makes them available for search via NRT, which involves a Lucene flush(). The second operation, the Elasticsearch flush(), executes a Lucene commit() and then clears the transaction log as all its documents have now been persisted by Lucene. By default, the refresh() operation is scheduled every second and the flush() operation is executed depending on certain heuristics.
Changing refresh behavior
Via the REST API, Elasticsearch allows us to redefine the refresh interval or even disable automated refresh completely. One approach to achieve transactional search visibility would be to disable automated refresh and instead request refresh manually each time we sent Elasticsearch a full group of documents. This solution, however, works only under two assumptions:
- The stream emits all documents of a group in direct succession, without any other documents in between. If the stream instead mixed documents of several groups, there would never be a safe point in time to call refresh because there might always be a group where only part of the documents have been sent to Elasticsearch so far.
- There is no parallel indexing by multiple, independent clients. If there were multiple clients explicitly requesting refresh, any documents sent by other clients would become visible for search as well.
At our customer, none of these assumptions was satisfied, so changing refresh behavior was not an option. Apart from that, the approach is undesirable for performance reasons because issuing refresh with high frequency will create lots of small segments (which makes search more costly and requires many future segment merges).
Explicitly modeling transactions
If changing refresh behavior does not help, we need to represent transactions explicitly in our data model. How about this simple idea: We extend our document mapping by a boolean field, say “completed”, initially set to false for newly indexed documents. We extend all search queries by a filter for documents where “completed” is true. Only when all documents of a group have been indexed, we update “completed” to true for all of them.