Overview

Writing better Caches in Action: Improving Joda Time

No Comments

Working with time and date in any programming language is a mess. To get an idea how much of an mess it actually is, I can recommend many reads, but the following video by Tom Scott nicely sums it up:

Java had two tries to come up with a good date and time implementation: java.util.Date and java.util.Calendar, but both failed. Then Stephen Colebourne came along and invented joda-time. Joda-Time is one of my favourite libraries and is without any doubt the best implementation of time and date handling you can find for Java. It was so good, that it became the foundation of JSR-310, the newest attempt at a core Java API for date management which is now shipped with Java 8, in many aspects.
While profiling some of our code recently, I finally saw the opportunity to say “thank you, Stephen” in my own way: by contributing patches.

One of the key features in Joda-Time is thread safety. It allows formatters and chronologies to be reused by multiple threads. But how exactly they are configured is up to the developer. It is possible to come up with any format pattern (like, “dd. hh:mm – yy”) and use it. Because the creation of formatters is quite expensive, Joda Time will automatically cache them. In fact the caching guarantees are pretty strong: For all possible configurations of formatters and chronologies, only a single instance will ever exist and you always will get that one back.

To achieve this, the caches were “synchronized”.

Here is one example:

Map<String, DateTimeFormatter> cCache = new HashMap<String, DateTimeFormatter>()
 
private DateTimeFormatter getFormatter(Locale locale) {
  locale = (locale == null ? Locale.getDefault() : locale);
  String key = Integer.toString(iType + (iDateStyle << 4) + (iTimeStyle << 8)) + locale.toString();
  DateTimeFormatter f = null;
  synchronized (cCache) {
    f = cCache.get(key);
    if (f == null) {
      String pattern = getPattern(locale);
      f = DateTimeFormat.forPattern(pattern);
      cCache.put(key, f);
    }
  }
  return f;
}

This code is typical for cache implementations. It is safe programming. In fact I am glad that the code is already safe, which unfortunately a lot of caching code is not. It is, however, not up to date to “modern” Java programming standards.

Here is how it works: After having computed a cache key, which involves Locale.toString() (a method which is surprisingly expensive), all multithreaded access is synchronized on the cache instance. Then the entry for the key is looked up. If it is not found, the entry is computed and added to the cache. And finally the synchronization ends, allowing other threads to look into the cache.

The main reason why this is inefficient is that threads need to wait, even if they would access a different value. Some people believe that the JVM optimizes the synchronized block. In fact it does, when it sees access by a single thread only. But in many caching cases, this is just not true. The cache is accessed from plenty of threads and this involves the object monitor overhead.

So the obvious idea is to not perform locking/synchronization when just reading from the cache. Thankfully Doug Lea already developed a Map which exactly does that: java.util.concurrent.ConcurrentHashMap.
It is a general purpose concurrent map, which does not lock on reads. While it is possible to write even faster or more efficient data structures, it performs really well as a drop in replacement for the above code.

The rewritten code looks like this:

private DateTimeFormatter getFormatter(Locale locale) {
  locale = (locale == null ? Locale.getDefault() : locale);
  StyleFormatterCacheKey key = new StyleFormatterCacheKey(iType, iDateStyle, iTimeStyle, locale);
  DateTimeFormatter f = cCache.get(key);
  if (f == null) {
    f = DateTimeFormat.forPattern(getPattern(locale));
    DateTimeFormatter oldFormatter = cCache.putIfAbsent(key, f);
    if (oldFormatter != null) {
      f = oldFormatter;
    }
  }
  return f;
}

Whats different? Well first of all, I introduced class called StyleFormatterCacheKey, which gets the values previously used to compute the cache key. Why is that? Well the java.util.HashMap and java.util.concurrent.ConcurrentHashMap both will perform equals() and hashCode() calls on the key. It is more efficient to delegate these calls to the original data structures, instead of creating a java.lang.String which will than perform its own computations.
As you can see cChache.get() is no longer synchronized. Gets can run through incredibly fast.

But what happens on a miss?
What happens here is that possibly multiple threads create a new org.joda.time.format.DateTimeFormat and try to put it to the cache. The tricky part is putIfAbsent(), which ensures that only the first entry added to the map is actually used. Any later formatters by other threads are not added to the cache.
Here now the strong caching guarantee comes into play. If our code shall never see two different formatters, we have to then discard our own computed formatter and return the one which was the first one added to the cache.
If your cache has less strict requirements, you could even get away with ignoring the return value or replacing the value in the map by calling just put().

So, you might wonder, how this actually performs? I have done a quite simplistic benchmark using the great JMH benchmarking framework. It assumes the fact that the same pattern is used all the time (so we have concurrency on insertion and read) by multiple threads. What I did not tell so far is that the above getFormatter() call is happening behind the scenes whenever you print or parse dates, so the benchmark is actually doing that.

Here its result I got when running with Java 8 on my MacBookPro:
Old code single threaded

Mode   Samples         Mean   Mean error    Units
thrpt        20     2022.227       46.700   ops/ms

Old code multi threaded (3 threads)

Mode   Samples         Mean   Mean error    Units
thrpt        20     5806.483      133.980   ops/ms

New code single threaded

Mode   Samples         Mean   Mean error    Units
thrpt        20     2324.084       30.112   ops/ms

New code multi threaded (3 threads)

Mode   Samples         Mean   Mean error    Units
thrpt        20     6228.543      174.374   ops/ms

(full original benchmark at: https://gist.github.com/CodingFabian/9088631)

So, it’s not exciting, is it? Well, it shows that coming up with an appropriate benchmark is hard. It is a “good case” benchmark, where we do not have many writes, which would prolong the synchronized block. Still the java.util.concurrent.ConcurrentHashMap usage is 10% faster with similar characteristics with regards to scaling with threads.

So, should we care? Hmm, maybe. Let’s look at a different example. This time we want to use a org.joda.time.chrono.BuddhistChronology to calculate buddhistic dates:

BuddhistChronology.getInstance(DateTimeZone.getDefault());

Lets do a benchmark first, shall we?

Single threaded:

Mode   Samples         Mean   Mean error    Units
thrpt        20    26315.317      278.507   ops/ms

Multi threaded using 3 Threads:

Mode   Samples         Mean   Mean error    Units
thrpt        20    13817.764      264.348   ops/ms

(original full benchmark at: https://gist.github.com/CodingFabian/9058922)

Wait, what? Running multiple threads actually drastically reduces the performance?
Here is the code:

public static synchronized BuddhistChronology getInstance(DateTimeZone zone) {
  if (zone == null) {
      zone = DateTimeZone.getDefault();
  }
  BuddhistChronology chrono;
  synchronized (cCache) {
      chrono = cCache.get(zone);
      if (chrono == null) {
          // First create without a lower limit.
          chrono = new BuddhistChronology(GJChronology.getInstance(zone, null), null);
          // Impose lower limit and make another BuddhistChronology.
          DateTime lowerLimit = new DateTime(1, 1, 1, 0, 0, 0, 0, chrono);
          chrono = new BuddhistChronology(LimitChronology.getInstance(chrono, lowerLimit, null), "");
          cCache.put(zone, chrono);
      }
  }
  return chrono;
}

This looks awfully like the previous example, does it? Again cCache is a java.util.HashMap. Notice that the whole method is synchronized.
Here is what i changed it to:

public static BuddhistChronology getInstance(DateTimeZone zone) {
  if (zone == null) {
      zone = DateTimeZone.getDefault();
  }
  BuddhistChronology chrono = cCache.get(zone);
  if (chrono == null) {
      // First create without a lower limit.
      chrono = new BuddhistChronology(GJChronology.getInstance(zone, null), null);
      // Impose lower limit and make another BuddhistChronology.
      DateTime lowerLimit = new DateTime(1, 1, 1, 0, 0, 0, 0, chrono);
      chrono = new BuddhistChronology(LimitChronology.getInstance(chrono, lowerLimit, null), "");
      BuddhistChronology oldChrono = cCache.putIfAbsent(zone, chrono);
      if (oldChrono != null) {
          chrono = oldChrono;
      }
  }
  return chrono;
}

not much difference. I removed the synchronized keywords and used a java.util.concurrent.ConcurrentHashMap.

Eager to see what effect it had? Lets run the same bench again:

Single threaded:

Mode   Samples         Mean   Mean error    Units
thrpt        20   198646.049     2716.604   ops/ms

Multi threaded using 3 threads:

Mode   Samples         Mean   Mean error    Units
thrpt        20   554412.528    21280.949   ops/ms

No, I am not cheating. This is real. We get a ten times increase in performance even in single threaded usage! And it suddenly scales in a linear manner rather than crash and burn.

Conclusion

synchronized is a Java keyword, and the JVM can do arcane tricks to optimize it. But sometimes the JVM or the developer fails with it. java.util.concurrent.ConcurrentHashMap is a much better alternative, because its semantics and performance are defined by its API. It should be the go-to solution for building multithreaded accessed maps.

References

1. Commit optimizing Chronology caches: https://github.com/JodaOrg/joda-time/commit/37ce3704787a4967a6238e3f3f56894f7554daef
related benchmark https://gist.github.com/CodingFabian/9058922

2. Commit optimizing Formatter caches: https://github.com/JodaOrg/joda-time/commit/acfec45d0324e41b8159cf2e63071390893e2037
Related benchmark https://gist.github.com/CodingFabian/9088631

Comment

Your email address will not be published. Required fields are marked *