Overview

Elasticsearch Zero Downtime Reindexing – Problems and Solutions

4 Comments

Reindexing Elasticsearch could be so easy. Well in the first place, we all wouldn’t have to reindex at all. Why should you do this? There is dynamic mapping! In this post I will explain why dynamic mapping won’t do you much good, how you can deal with inevitable errors in your static mapping, what zero downtime reindexing is, and finally how you can deal with the drawbacks this approach has.

Basics: In the end, everyone maps static anyways.

So what happens when you throw a random json at Elasticsearch and call it a day? Elasticsearch will, after it finds out that the given Index does not provide a mapping for that kind of data, try to determine a new mapping according to the data supplied.

So if we throw a “book” at a fresh Elasticsearch instance with dynamic indexing turned on:

POST blog/articles/1
{
  "author": "Chris",
  "title": "useful Cat facts (III)"
}

Elasticsearch will index it without complaints, because these are obviously String fields:

GET blog/articles/_mapping
"blog": {
  "mappings": {
    "articles": {
      "properties": {
        "author": { 
          "type": "string"
        },
        "title": {
          "type": "string"
}}}}}}

But we are in the epoch of Big Data, where input comes in chaotically, without much norming. Let’s imagine someone comes along and posts a new blogpost:

POST blog/articles/2
{
  "Author": "The Dude",
  "Title": "thats just like, your opinion man!"
}

This will be indexed just fine, but our new mapping will look like:

{
  "blog": {
    "mappings": {
      "articles": {
        "properties": {
          "Author": {
            "type": "string"
          },
          "Title": {
            "type": "string"
          },
          "author": { 
            "type": "string"
          },
          "title": {
            "type": "string"
}}}}}}

Yikes! That’s not what we wanted – Elasticsearch can’t determinate if this is “legitimately” different or we’ve just been vague. So sooner or later (and hopefully sooner) you will start to define a mapping for your data.

Your Mapping is most likely wrong

Okay, now that we’ve got the basics out of the way we get to the more sophisticated Problems – what happens when your mapping is wrong? Generally it’s additive: When you add a field, the underlying newly created Lucene segments will just be bigger from now on, and the old ones are left as they were. Searches for the new field will be applied to old segments, but will not result in a hit. Since Lucene never edits a written segment, this bubbles up to Elasticsearch – we cannot change a field type after data has been indexed.

We all know that our first guesses when setting things up is most likely not the end-to-be-all, but needs to be revised later on. The very same happens when you have your Elasticsearch cluster already in production.

The simplest way to tackle this would be just to drop your current index, apply a new mapping and reindex everything again. This approach is fine while you’re still in your dev (or maybe staging) environment. But in production, your reindex can easily take a couple hours, maybe days – Good luck telling your customers you’re offline during that period. Also this only works if you have  your old data available somewhere else to feed the reindex – otherwise you need to figure out how to do this without downtime.

Zero Downtime Reindexing

There is already a great entry in the Elasticsearch Guide that is derived from the post on the official blog that you should read, too. Just to give it a short TL;DR:

Elasticsearch provides us with the fantastic and helpful concept of aliases. So to get to a seamless reindexing you do the following:

  • create an alias that points to the index with the old mapping
  • point your application to your alias instead of your index
  • create a new index with the updated mapping
  • move data from old to new
  • atomically move the alias from the old to the new index
  • delete the old stuff

-> The cluster stays fully operational during the whole operation and you experience no downtime!

1. Where do the WRITE operations go in the meanwhile?

Unfortunately the official documentation does not discuss how to handle incoming writes to your cluster during the reindexing period. Such an operation might take a lot of time depending on various factors like your machinery, the size of your dataset, your analyzers and so on. Aliases do not allow us to write to both the old and the new index at the same time, so we need to take care of that. Currently I’d suggest two approaches:

1 a) Duplicate Writes yourself

The most straightforward solution is to change your application in a way so it will write the same data to both of your indices simultaneously.

zero_downtime_reindex_double_write

Obviously, duplicated writes will leave their performance impact when both indices operate on the same machine. But it might be worth it if your reindex process dies in the middle of the reindexing and you do not have a mechanism for recovery implemented – your old data is still in a valid state.

1b) Write to new index and read from both

The Guide states:

A search request can target multiple indices, so having the search alias point to tweets_1 and tweets_2 is perfectly valid. However, indexing requests can only target a single index. For this reason, we have to switch the index alias to point only to the new index.

If you are not in control of the software writing towards your application, or the first approach is not feasible because of other environmental constraints, you can alternatively switch the write alias towards the new index and read from both at the same time. Please note that you will get duplicates in your queries, so it is your responsibility to deal with them application-wise. Also concepts like pagination will provide additional hurdles.

zero_downtime_reindex

In conclusion, your application has to be aware of the reindexing process and behave accordingly to your chosen strategy. Either you will write in both indices or deal with duplicated results. It depends on your application which way is acceptable. But besides this point, this concept has another weakness:

2. Lost Updates and Deletes!

When we’re in the middle of a lenghty reindexing process, all incoming writes are written to the new index. This is unproblematic for indexing new documents – they are just appended to the index, and have no relation to the old one.

But what about an UPDATE or DELETE of a document? When they are already transferred into the new index, there is no problem. But in the other case, the external operation will fail with an error, and later on the value will be put into the new index in an outdated version.

lost_delete

Now this output is not desirable and should be avoided! If your application supports updates and deletions we will have to include additional steps into our reindexing process. The basic idea is that you do not delete documents, but mark them as “deleted” instead and exclude them from queries. Here are some proposals to get you started:

2 a) Incremental Reindexing

For this approach to work, your whole infrastructure needs to adapt the following two concepts:

  • Every modification updates a timestamp field of the document
    Instead of writing your critical updates and deletions to the new index we will still apply them to the old one. Our reindexing job will move all documents that are older than its own start timestamp to the new index. Every update that happens to be during this time will update the document timestamp. Note that Elasticsearch already provides a _timestamp field that can be activated in the mapping.
  • When the reindexing job has terminated successfully it will start again and transfer all modifications during its last execution time. When it reaches an iteration where it has nothing to do, we consider it done and continue the wrap-up as in the regular process.

Drawbacks:

  • If you have a lot of deletions you will artificially bloat your index. This can be improved by cleaning all marked-as-deleted documents after your reindex. Still, since a DELETE in Elasticsearch will just be a mark-as-delete in Lucene, there will be bloat.
  • The logical delete implemented as an UPDATE  is more expensive than a regular DELETE, so watch out for performance hits.
  • After the last reindexing iteration, there must be a “Stop-the-world” phase to prevent any modifications from sneaking in. Our suggested approach would be to include that into your deployment process if you can.

2 b). Modification Buffering

If your reindexing is expected to last only a short amount of time there might be another solution to be considered:

Elasticsearch has a simple versioning control with the special _version field. When your application keeps this information during the GET -> modify -> UPDATE / DELETE phase and sends it back, Elasticsearch will check if the version matches.

Example: If your Document has a version #1 and you send the UPDATE to Eleasticearch with this version as a parameter, and the document has not been transferred yet, you will get a VersionConflictEngineException – in this case, hold the update in your application and retry later (how much “later” is acceptable depends on your application and can ultimately only be answered by you).

The same drawback as in 2a applies: You cannot truly delete your documents anymore, you have to mark them as deleted as well.

Conclusion

It’s not important which solution you will take from this article, the most important point is to be aware of the drawbacks of the “official” reindexing procedure. You’ll have to figure out how you will work around these limitations depending on your business needs.

Kommentare

  • 18. September 2014 von Jayson Minard

    This is not a problem for updates, you have at least two easy options:

    option 1: updates written to new index block reindex operations that insert using optype=create so newer update always wins. why? because the reindex operation will fail if the record was written by an update first. the updates should be written as optype index (default) so they always win.

    option 2: optimistic currency with version as date of transaction, latest date wins regardless of order executed. if the update is newer, it beats the reindex operation. and for deletes they are tombstoned for you by elasticsearch. the window that deletes hang around waiting to block updates/adds is configurable. set it to some value higher than your total reindex time. they clean up on their own…

    • Patrick Peschlow

      Thanks for your very helpful suggestions. Just to clarify: You mentioned the index.gc_deletes setting for deletes only within option 2, but it should work for option 1 just the same, right? I like option 1 a bit better because it is applicable also when you are not already using the version field for some other application-related version information.

  • Firstly, thanks for this post. It’s great to have all these options cataloged in one place.

    I would personally consider it best practice to throw your new writes into a message queue, preferably something like Apache Kafka. You have an “blue” subscriber that listens to the topic and writes on the soon-to-be-retired index. Then you wait the reindex job to finish and replay the queue as a “green” subscriber. One of the most interesting properties of Kafka is that you can replay messages that have already been consumed. When you connect to Kafka as a consumer, you define a topic, partition and offset and this offset can be the tail position from 10 minutes ago or whenever you started the index job. After the “green” consumer has consumed everything up to the current tail, you flip the alias and retire the blue index. This avoids the accidental re-write scenario entirely.

    • Patrick Peschlow

      That’s a really good practice. Of course, the extra queuing delay and resulting async should not conflict with some functional or performance requirement of the application (let’s say the ability to real-time get indexed documents after the original call returns).

      Recently we had to migrate an index to a new Elasticsearch cluster and having such a queue in place would have greatly helped. In fact, we actually considered adding such indirection to the system before executing the reindexing job, but given the planned time frame it was not to be. I plan to report on this particular reindexing scenario in a future post on this blog.

Comment

Your email address will not be published. Required fields are marked *