Overview

The JVM on Fire – Using Flame Graphs to Analyse Performance

No Comments

Currently there are several tools available to analyse your application performance and show the results. Traditionally these results are either shown in some kind of table form, either flat or as a tree view. Flame graphs are relatively new and take a new angle to show the results.
Furthermore, flame graphs can be generated on various levels; tooling hooking into the JVM, but also on (Linux / MacOS) OS level.

In this post I’m hoping to give you some idea about what flame graphs are, how to read them and what tooling is available to create them.

What are Flame Graphs?

Flame graphs were invented by Brendan Gregg as a means to show profiling results. Or, as he puts it:

Flame graphs are a visualization for sampled stack traces, which allows hot code-paths to be identified quickly.

Below you will see an example flame graph from a Spring Boot demo application.

Flame graphs are generally created from the results of sampling profilers. These will create stack samples which can then be transformed / folded to get the right format for generating the flame graph. The graph then is generated by the Flame Graph tool by Brendan Gregg, a simple Perl script that will output an SVG image.
The nice thing about the SVG image is that it can be searched and filtered (unfortunately not possible in this blog post), which makes it really easy to traverse. Also the input for generating the flame graph is a simple text file that can be easily hacked to add / filter information!

A flame graph will give you the following information:

  • Y-axis / Height: presents the stack depth
  • X-axis / Width: presents how much time (number of samples) was spent within a method
  • Color: depending on the configuration will e.g. highlight Java, C++, kernel methods

Important to note is that the X-axis != time passing. The stack frames are sorted alphabetically from left to right. So only look at the width, not where they are on the axis.

Compare that to the JProfiler tree view below, where the overview gets lost much more quickly.

As always when looking at profiling results, though, make sure you know what you are looking at. Certain profilers e.g. suffer from ‘safepoint sampling bias’ (therefore we’ll use the Honest Profiler later on). Similar when using instrumentation, because of additional overhead this might mess up the results and is therefore not recommended.

Needed to get started

The really nice thing about flame graphs is that you can generate them for Java only, but also on OS level (Linux and OS X) to get lower level information in there as well. Previously the system profilers could not fill the entire Java stack because information got lost. Since JDK8 update 60 build 19 (and also JDK9) the frame pointer information is fixed by using the option -XX:+PreserveFramePointer. Later on I’ll show both how to generate JVM only flame graphs as well as OS level flame graphs.

If you only like to generate JVM flame graphs, the following tools are needed:

To generate flame graphs for Linux as I’ll show later on, you’ll also need the following:

Generating Java-only Flame Graphs

For this example we’ll use a really simple application built by colleagues to do some CPU profiling: Profiler Schulung. While this will not generate fancy results in any way, it’s simple to run, runs only for a short time, and makes the differences between results on JVM and OS level profiling very clear. The application can be built using Maven.

To generate the Java flame graphs we’ll use the Honest Profiler. Besides not being troubled by the safepoint bias, the profiler also includes code to transform the stack information ready for the Flame Graph tool to be processed (although this is somewhat hidden).

First step is to include the Honest Profiler agent when executing the application. The full command will look something like this:

java -agentpath:/honest-profiler/liblagent.so=interval=7,logPath=/tmp/log.hpl -cp Cpu-0.0.1-SNAPSHOT.jar de.codecentric.training.javaprofiling.cpu.PrimeFinderRunner 1000000 1 1

The application may be started immediately. When finished, this will output the /tmp/log.hpl file containing the raw stack information.
This needs to be converted to the folded stack data as such:

java -cp /honest-profiler/honest-profiler.jar com.insightfullogic.honest_profiler.ports.console.FlameGraphDumperApplication /tmp/log.hpl /tmp/log.folded

From this folded stack information we can now create the flame graph:

/FlameGraph/flamegraph.pl /tmp/log.folded > /tmp/flamegraph-java.svg

If everything went successfully, this should result in the following graph:

You might get a slightly different result, because small code is being inlined. This can be solved either by using the option -XX:InlineSmallCode=100 to prevent inlining except for really small code blocks. Or using the other (preferred) options: -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints.

As you see in the graph, also some ‘unknown’ AGCT.Unknown Java samples are visible. These could not be mapped to capture the stack information and tools such as VisualVM will not show them at all. In the next step they will become visible with Linux Perf.

Generating Linux Flame Graphs

Now for the interesting stuff, where we’ll use Linux perf_events to do profiling on OS level. What’s important to note is that the perf command runs separately from the Java process, so we’ll first need to start Java and then start the profiling. Also we need to grab the stack information separately to fill in the blanks. As the JVM does inlining, class loading, garbage collection and other stuff, make sure to profile a JVM that is ‘stable’, otherwise stale data is read and certain stack frames might not be able to get mapped.

To make our lives a bit easier, the perf-map-agent tool includes some scripts for running the perf command together with grabbing the stack mappings. There is even one script to additionally create the flame graph but for clarity we’ll do that step manually.

First start the Java application with the frame pointer information enabled:

java -XX:+PreserveFramePointer -cp Cpu-0.0.1-SNAPSHOT.jar de.codecentric.training.javaprofiling.cpu.PrimeFinderRunner 1000000 1 1

Search for the PID of the running process (ps aux | grep java). Then start the profiling for 30 seconds and afterwards automatically fetch the stack mapping information:

export PERF_RECORD_SECONDS=30
perf-map-agent/bin/perf-java-record-stack <PID>

By default this will create some output files in /tmp. We still need the perf_events information to be extracted though:

sudo perf script -i /tmp/perf-<PID>.data > /tmp/out-<PID>.stacks

And finally we can fold the stack information and create the flame graph in one go:

/FlameGraph/stackcollapse-perf.pl /tmp/out-<PID>.stacks | tee /tmp/out-<PID>.collapsed | /FlameGraph/flamegraph.pl --color=java --hash > /tmp/flamegraph.svg

Now you should see the following output:

As you can see in the image created, there’s more information visible besides the single running thread. These are all actions the JVM runs internally such as garbage collection. In the first example from a Spring Boot application much more stacks can be found.

I invite you to install the tooling yourself and give it a go. See what kind of flame graphs are created and how you can navigate them to get all information out. See below for some links to get you started or for more in-depth information.

Cheers!

Interesting Links / Further Information

Tags

Comment

Your email address will not be published. Required fields are marked *