Overview

Transactions in Elasticsearch

3 Comments

Earlier this year a customer mentioned a search requirement that I hadn’t really thought about before: How to achieve transactions in Elasticsearch? Recently, the same requirement popped up again in a conversation I had with other search aficionados. Time to share some thoughts about transactions in Elasticsearch and how to implement them!

Before we start looking for solutions, we need to clarify what we are actually talking about as the term transaction is quite overloaded. In our scenario, there is a stream of documents to be indexed, some of which form logical groups. The transactional requirement is that, for each document group, no document of that group becomes searchable before any other document of the same group. Put differently, all documents in a group become visible for search at exactly the same point in time. Thus, we are talking about atomicity with respect to visibility for search, and I’m going to refer to that as a transaction. So how to achieve transactions with Elasticsearch?

Elasticsearch indexing operations

Let’s take a look at how Elasticsearch handles indexing operations so that we know what we can build on. Elasticsearch internally uses Lucene, so let’s check how the two work together.

With Lucene, documents to be indexed are first kept in an in-memory buffer where they are neither searchable nor persisted. A Lucene flush() operation, which is executed based on certain heuristics, turns the in-memory buffer into a segment. A Lucene commit() goes one step further, as apart from forcing a flush() it persists any in-memory segments on disk and makes them searchable. Because a commit() is rather heavyweight in terms of I/O, Lucene has a feature called near-real-time search (NRT) which is able to make in-memory segments searchable even before commit() is called.

Illustration of the Lucene flush() and commit() operations.

Illustration of the Lucene flush() and commit() operations.

Thus, with Lucene, a segment is the atomic unit with respect to search visibility, but only a commit() guarantees durability. If you are curious why Lucene uses segments in the first place and how this approach benefits search performance, I highly recommend reading Lucene in Action, Second Edition, Manning 2010.

When Elasticsearch receives an indexing request via the REST API, it needs to persist the document so that it can send the client an acknowledgement of safe reception. As it would be way too costly to execute a Lucene commit() for each received document, Elasticsearch uses its own persistence mechanism called the transaction log. The transaction log supports two major operations which eventually map to the Lucene operations introduced above. The first operation, the Elasticsearch refresh(), turns transaction log contents into a segment and makes them available for search via NRT, which involves a Lucene flush(). The second operation, the Elasticsearch flush(), executes a Lucene commit() and then clears the transaction log as all its documents have now been persisted by Lucene. By default, the refresh() operation is scheduled every second and the flush() operation is executed depending on certain heuristics.

Illustration of the way in which Elasticsearch indexing operations are mapped to Lucene.

Illustration of the Elasticsearch transaction log, the refresh() and flush() operations and how they call Lucene operations.

Changing refresh behavior

Via the REST API, Elasticsearch allows us to redefine the refresh interval or even disable automated refresh completely. One approach to achieve transactional search visibility would be to disable automated refresh and instead request refresh manually each time we sent Elasticsearch a full group of documents. This solution, however, works only under two assumptions:

  • The stream emits all documents of a group in direct succession, without any other documents in between. If the stream instead mixed documents of several groups, there would never be a safe point in time to call refresh because there might always be a group where only part of the documents have been sent to Elasticsearch so far.
  • There is no parallel indexing by multiple, independent clients. If there were multiple clients explicitly requesting refresh, any documents sent by other clients would become visible for search as well.

At our customer, none of these assumptions was satisfied, so changing refresh behavior was not an option. Apart from that, the approach is undesirable for performance reasons because issuing refresh with high frequency will create lots of small segments (which makes search more costly and requires many future segment merges).

Explicitly modeling transactions

If changing refresh behavior does not help, we need to represent transactions explicitly in our data model. How about this simple idea: We extend our document mapping by a boolean field, say “completed”, initially set to false for newly indexed documents. We extend all search queries by a filter for documents where “completed” is true. Only when all documents of a group have been indexed, we update “completed” to true for all of them.

While this approach may sound promising at first, it has several drawbacks: First, it is not reliable because of refresh behavior. Updating “completed” for all documents of a group is not an atomic operation, so if refresh happens in the middle of that operation only part of the documents will be visible for search. Second, it is not particularly cheap because all documents are effectively indexed twice. Third, how do we actually want to implement the update for setting “completed” to true? Right now we would have to query for all documents of the group first and then update all of them by ID (yet it has to be said that some support for multi-document update capabilities is apparently planned for some future Elasticsearch version).

Let’s look for a better way to tie the documents of a group together. A primer on data modeling in Elasticsearch is this chapter of The Definitive Guide (highly recommended reading), according to which we have three options: Inner objects, nested objects, and parent-child relationships. A solution based on inner or nested objects would require storing the contents of several documents in a single document, which means we would lose the ability to query and retrieve them individually (in case of nested objects) or the sub-documents would lose their identity altogether (in case of inner objects). Parent-child relationships are the only way to establish relations between multiple documents and leave them intact otherwise.

Transactions with parent-child relationships

The parent-child mechanism offered by Elasticsearch supports the modeling of one-to-many relationships. A document type may refer to a parent type in its mapping, and each document becomes a child of the parent document it is referring to. Every time a child document is indexed, it needs to specify the ID of its parent document.

We will use these parent-child relationships to implement transactions. In order to avoid cluttering this article with lots of code, I will only outline and discuss the solution here. In this Gist you will find a complete, commented scenario where the solution is demonstrated using the Sense tool that ships as part of Elasticsearch Marvel (and thus is free for development use). If you haven’t used Sense yet, give it a try and you will never want to use curl again for talking to Elasticsearch.

We define the following mapping for a type called “document”, representing the documents of the stream that we actually want to index:

"document": {
    "_parent": { "type": "transaction" },
    "properties": {
        "transaction_id": { "type": "string", "index": "not_analyzed" },
        "data": { "type": "string", "index": "not_analyzed" }
    }
}

Each document carries a transaction ID (uniquely identifying the group the documents belongs to) as well as its actual data (here I assume a simple string field, but usually there will be much more). Furthermore, the document refers to a parent document of “transaction” type. Let’s look at the mapping for the “transaction” type:

"transaction": {
    "properties": {
        "id": { "type": "string", "index": "not_analyzed" }
     }
 }

A “transaction” document only contains an ID (the transaction ID), nothing more. We will use the existence of the “transaction” document for a group to tell if that group of documents is completed: As long as the group has not been completely sent to Elasticsearch, we don’t create the parent document for that group. When the group is completed, however, we immediately create the document.

By including a check for the existence of the parent document into queries for child documents, we can ensure that search only ever returns documents of completed groups. For example, the following query searches for all documents where the “data” field matches “part1”, but thanks to the additional parent filter only those with existing parents, i.e., completed transactions, will be returned:

"query": {
    "filtered": {
        "query": {
            "match": { "data": "part1" }
        },
        "filter": {
            "has_parent": {
                "type": "transaction",
                "query": { "match_all": {} }
            }
        }
    }
}

It’s as simple as that! The benefits of this solution are:

  • Every document is only indexed once, with only one additional indexing request for each transaction.
  • Except from the reference to the parent type, the document mapping remains unchanged.
  • No particular ordering of the document stream is required, except that it must be possible to identify the last document of a group.

Naturally, the approach also has some limitations. If there are multiple clients working in parallel and documents of the same group can be handled by different clients, it is possible that one client indexes the last document of a group already while other documents of the same group are still queued at other clients. Also, the solution cannot work if there is no way to identify the last document of a group.

Both issues may be addressed by a modification of the approach which becomes possible if the number of documents in a group is known beforehand. Then we can store a counter in the parent document, and each document indexed increments the counter by one. If we modify the search request accordingly, we can achieve that documents only become visible for search when the counter has reached the expected number of documents of their group. I also created a showcase for this modification which you can find here.

Finally, we need to be aware that the use of parent-child relationships may involve non-negligible overhead regarding search time and memory. If queries turn out to be too slow, or if memory issues arise, it may make sense to look for alternative solutions based on the nested objects approach after all, even if that would involve combining all documents of a group into a single one.

Kommentare

  • Hello Patrick
    Thank you for your considerations. These helped me a lot. I tried to apply your thoughts for update and delete transactions too, but somehow I do not find a proper solution, first of all because it seems not possible to modify the parent of a document.
    Do you have a solution for this kind of things? Best regards, Michael

  • Patrick Peschlow

    Hey Michael,

    it’s correct that you cannot apply the same idea, to have queries consider some field in a common parent document, for updates and deletes. If you query for the new value of the field (that you would set after the update/delete has completed), then you wouldn’t get the old versions of the documents anymore. With inserts, this is not a problem because the documents were not visible in the first place (there is no previous state to be missed).

    However, what you ask for would be possible to do on the Lucene level. So if you can make some additional assumptions, like there is only a single writer and the update/delete transactions don’t overlap each other, then you can probably use the Elasticsearch refresh mechanism to get the desired behavior. If you decide to go down that road you should make sure to evaluate the performance of the resulting solution, because there will likely be some performance penalty.

    Depending on your exact requirements (what exactly is an update or delete transaction for you? is it exactly like I described, for visibility, or is it different), there may be other ways.

    -Patrick

  • Hey Patrick

    I’am working on the renewal of our portfolio-management-system.
    Elastic will play a leading role in our new architecture. For the most part of our data, Elastic is
    not the primary data store. Nevertheless there will be some application-specific data, simulations par example.
    As we all come from a RDBMS world, we feel quite uncertain about the lack of transactions, most of all because we build a financial app, not some kind of social media thing!
    Your considerations are very valuable to us, in terms of what we can do with Elastic but also what eventually not.
    At the moment I tend to let a hammer be a hammer and a nail be a nail and to use a RDBMS where it’s really necessary.
    But you are right, one has to look careful at the exact requirement of each and every use case.

    Again, thank you very much!
    Michael

Comment

Your email address will not be published. Required fields are marked *