The Concurrent Mark Sweep Collector (“CMS Collector”) of the HotSpot JVM has one primary goal: low application pause times. This goal is important for most interactive applications like web applications. Before we take a look at the relevant JVM flags, let us briefly recap the operation of the CMS Collector and the main challenges that may arise when using it.
Just like the Throughput Collector (see part 6 of the series), the CMS Collector handles objects in the old generation, yet its operation is much more complex. The Throughput Collector always pauses the application threads, possibly for a considerable amount of time, which however enables its algorithms to safely ignore the application. In contrast to that, the CMS Collector is designed to run mostly concurrent to the application threads and only cause few (and short) pause times. The downside of running GC concurrently to the application is that various synchronization and data inconsistency issues may arise. In order to achieve safe and correct concurrent execution, a GC cycle of the CMS Collector is divided into a number of consecutive phases.
Phases of the CMS Collector
A GC cycle of the CMS Collector consists of six phases. Four of the phases (the names of which start with „Concurrent“) are run concurrently to the actual application while the other two phases need to stop the application threads.
- Initial Mark: The application threads are paused in order to collect their object references. When this is finished, the application threads are started again.
- Concurrent Mark: Starting from the object references collected in phase 1, all other referenced objects are traversed.
- Concurrent Preclean: Changes to object references made by the application threads while phase 2 was running are used to update the results from phase 2.
- Remark: As phase 3 is concurrent as well, further changes to object references may have happened. Therefore, the application threads are stopped once more to take any such updates into account and ensure a correct view of referenced objects before the actual cleaning takes place. This step is essential because it must be avoided to collect any objects that are still referenced.
- Concurrent Sweep: All objects that are not referenced anymore get removed from the heap.
- Concurrent Reset: The collector does some housekeeping work so that there is a clean state when the next GC cycle starts.
A common misconception is that the CMS collector runs fully concurrent to the application. We have seen that this is not the case, even if the stop-the-world phases are usually very short when compared to the concurrent phases.
It should be noted that, even though the CMS Collector offers a mostly concurrent solution for old generation GCs, young generation GCs are still handled using a stop-the-world approach. The rationale behind this is that young generation GCs are typically short enough so that the resulting pause times are satisfactory even for interactive applications.
When using the CMS Collector in real-world applications, we face two major challenges that may create a need for tuning:
- Heap fragmentation
- High object allocation rate
Heap fragmentation is possible because, unlike the Throughput Collector, the CMS Collector does not contain any mechanism for defragmentation. As a consequence, an application may find itself in a situation where an object cannot be allocated even though the total heap space is far from exhausted – simply because there is no consecutive memory area available to fully accommodate the object. When this happens, the concurrent algorithms do not help anymore and thus, as a last resort, the JVM triggers a full GC. Recall that a full GC runs the algorithm used by the Throughput Collector and thus resolves the fragmentation issues – but it also stops the application threads. Thus, despite all the concurrency that the CMS Collector brings, there is still a risk of a long stop-the-world pause happening. This is “by design” and cannot be switched off – we can only reduce its likelihood by tuning the collector. Which is problematic for interactive applications that would like to have a 100% guarantee of being safe from any noticeable stop-the-world pauses.
The second challenge is high object allocation rate of the application. If the rate at which objects get instantiated is higher than the rate at which the collector removes dead objects from the heap, the concurrent algorithm fails once again. At some point, the old generation will not have enough space available to accommodate an object that is to be promoted from the young generation. This situation is referred to as “concurrent mode failure”, and the JVM reacts just like in the heap fragmentation scenario: It triggers a full GC.
When one of these scenarios manifests itself in practice (which, as is so often the case, usually happens on a production system), it often turns out that there is an unnecessary large amount of objects in the old generation. One possible countermeasure is to increase young generation size, in order to prevent premature promotions of short-lived objects into the old generation. Another approach is to use a profiler, or take heap dumps of the running system, to analyze the application for excessive object allocation, identify these objects, and eventually reduce the amount of objects allocated.
In the following we will take a look at the most relevant JVM flags available for tuning the CMS Collector.
This flag is needed to activate the CMS Collector in the first place. By default, HotSpot uses the Throughput Collector instead.
When the CMS collector is used, this flag activates the parallel execution of young generation GCs using multiple threads. It may seem surprising at first that we cannot simply reuse the flag
-XX:+UseParallelGC known from the Throughput Collector, because conceptually the young generation GC algorithms used are the same. However, since the interplay between the young generation GC algorithm and the old generation GC algorithm is different with the CMS collector, there are two different implementations of young generation GC and thus two different flags.
Note that with recent JVM versions
-XX:+UseParNewGC is enabled automatically when
-XX:+UseConcMarkSweepGC is set. As a consequence, if parallel young generation GC is not desired, it needs to be disabled by setting
When this flag is set, the concurrent CMS phases are run with multiple threads (and thus, multiple GC threads work in parallel with all the application threads). This flag is already activated by default. If serial execution is preferred, which may make sense depending on the hardware used, multithreaded execution can be deactivated via
-XX:ConcGCThreads=<value> (in earlier JVM versions also known as
-XX:ParallelCMSThreads) defines the number of threads with which the concurrent CMS phases are run. For example, value=4 means that all concurrent phases of a CMS cycle are run using 4 threads. Even though a higher number of threads may well speed up the concurrent CMS phases, it also causes additional synchronization overhead. Thus, for a particular application at hand, it should be measured if increasing the number of CMS threads really brings an improvement or not.
If this flag is not explicitly set, the JVM computes a default number of parallel CMS threads which depends on the value of the flag
-XX: ParallelGCThreads known from the Throughput Collector. The formula used is ConcGCThreads = (ParallelGCThreads + 3)/4. Thus, with the CMS Collector, the flag
-XX:ParallelGCThreads does not only affect stop-the-world GC phases but also the concurrent phases.
In summary, there are quite a few ways of configuring multithreaded execution of the CMS collector. Precisely for this reason, it is recommended to first run the CMS Collector with its default settings and then measure if there is a need for tuning at all. Only if measurements in a production system (or a production-like test system) show that the pause time goals of the application are not reached, GC tuning via these flags should be considered.
The Throughput Collector starts a GC cycle only when the heap is full, i.e., when there is not enough space available to store a newly allocated or promoted object. With the CMS Collector, it is not advisable to wait this long because it the application keeps on running (and allocating objects) during concurrent GC. Thus, in order to finish a GC cycle before the application runs out of memory, the CMS Collector needs to start a GC cycle much earlier than the Throughput Collector.
As different applications have different object allocation patterns, the JVM collects run time statistics about the actual object allocations (and deallocations) it observes and uses them to determine when to start a CMS GC cycle. To bootstrap this process, the JVM takes a hint when to start the very first CMS run. The hint may be set via
-XX:CMSInitiatingOccupancyFraction=<value> where value denotes the utilization of old generation heap space in percent. For example, value=75 means that the first CMS cycle starts when 75% of the old generation is occupied. Traditionally, the default value of CMSInitiatingOccupancyFraction is 68 (which was determined empirically quite some time ago).
We can use the flag
-XX+UseCMSInitiatingOccupancyOnly to instruct the JVM not to base its decision when to start a CMS cycle on run time statistics. Instead, when this flag is enabled, the JVM uses the value of
CMSInitiatingOccupancyFraction for every CMS cycle, not just for the first one. However, keep in mind that in the majority of cases the JVM does a better job of making GC decisions than us humans. Therefore, we should use this flag only if we have good reason (i.e., measurements) as well as really good knowledge of the lifecycle of objects generated by the application.
In contrast to the Throughput Collector, the CMS Collector does not perform GC in the permanent generation by default. If permanent generation GC is desired, it can be enabled via
-XX:+CMSClassUnloadingEnabled. In earlier JVM versions, it may be required to additionally set the flag
-XX:+CMSPermGenSweepingEnabled. Note that, even if this flag is not set, there will be an attempt to garbage-collect permanent generation once it runs out of space, but the collection will not be concurrent – instead, once again, a full GC will be run.
This flag activates the incremental mode of the CMS Collector. Incremental mode pauses the concurrent CMS phases regularly, so as to yield completely to the application threads. As a consequence, the collector will take longer to complete a whole CMS cycle. Therefore, using incremental mode only makes sense if it has been measured that the threads running a normal CMS cycle interfere too much with the application threads. This happens rather rarely on modern server hardware which usually has enough processors available to accommodate concurrent GC.
-XX:+ExplicitGCInvokesConcurrent and -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses
Nowadays, the widely accepted best practice is to refrain from explicitly invoking GC (a so-called “system GC”) by calling System.gc() in the application. While this advice holds regardless of the GC algorithm used, it is worth mentioning that a system GC is an especially unfortunate event when the CMS Collector is used, because it triggers a full GC by default. Luckily, there is a way to change the default. The flag
-XX:+ExplicitGCInvokesConcurrent instructs the JVM to run a CMS GC instead of a full GC whenever system GC is requested. There is a second flag,
-XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses, which additionally ensures that the permanent generation is included into the CMS GC in case of a system GC request. Thus, by using these flags, we can safeguard ourselves against unexpected stop-the-world system GCs.
And while we are on the subject… this is a good opportunity to mention the flag
-XX:+DisableExplicitGC which tells the JVM to completely ignore system GC requests (regardless of the type of collector used). For me, this flag belongs to a set of “default” flags that can be safely specified on every JVM run without further thinking.