Optimizing iText performance using AppDynamics and YourKit


The following example shows how easy it is to combine a performance monitoring solution with a profiler.
On a regular patrol through our AppDynamics monitoring on our continuously integrated projects, I found this interesting HotSpot in iText. iText, a previously free but now commercial java library, allows to parse PDFs easily by just using a PDFReader:

PdfReader reader = new PdfReader(filename);

Thats really easy!
But in fact this does more than expected as you easily can discover on the following screenshot showing code HotSpots in AppDynamics:

[singlepic id=336 w=560 float=]

About 20% of the whole transaction is spend on opening 2 PDFs. I was getting curious. What is happening there? Is there a chance for slow code?

The Invocation Trace shows that inside the Constructor of the PdfReader a PRTokeniser is created, which itself creates a RandomAccessFileOrArray. This then again opens a RandomAccessFile or a MappedRandomAccessFile. Latter uses a FileChannel to read the file into memory. Looks like a simple approach, but with a lot delegation, so there should be a way to look for alternatives to play with.

So I wrote a little microbenchmark for our code which reads fields from PDF files.
After3000 warmup calls, I ran 10000 benchmarked calls, which totaled to 12.3 seconds on my machine (Win 7 32bit, Java 1.6.20, -server).
To understand what the code was doing I used YourKit as Profiler. After the code was run, it produces this report:

I found an alternative Constructor, which was taking a RandomAccessFileOrArray directly. So i tried this to be able to substitute the method for reading the file. So to approach this in baby steps, I did what the other constructor would have done:

PdfReader reader = new PdfReader(new RandomAccessFileOrArray(filename), null);

But I got surprised! I found out that this code took a lot less time for the 10k loops: 5.8 seconds. By just doing the same stuff as before? Really?
The JavaDoc revealed what was going differently here:

* Reads and parses a pdf document. Contrary to the other constructors only the xref is read
* into memory. The reader is said to be working in "partial" mode as only parts of the pdf
* are read as needed.

Interesting! This could be performing a lot better for my use cases, which only covers reading form fields.

YouKit proved this. No more traces of expensive constructors:

So this second variant is faster. And I was not even looking for this. But when working on performance, I can recommend to reach for lowhanging fruits. And very often you find that you need to change a slightly different place than the one exposing the performance issue. Since I did this improvement in our PDFService code it never showed up in performance reports.

For me this is a very good example of how you should work towards improved performance

  1. Monitor performance in production and test.
  2. Analyze anomalies.
  3. Evaluate different approaches to solving the problem using a profiler.
  4. Prefer doing small simple changes with large effect.
  5. Continue observing performance.


  • Renjie

    24. August 2011 von Renjie

    Thanks for your post, but I am still get confused: what is the difference between YourKit and AppDynamics?

  • Fabian Lange

    Hi Renjie,
    there are lots of differences.
    AppDynamics is a monitoring tool, which runs in your production and test environments and tracks what is actually slow. As you know, you should optimize only real bottlenecks, not do premature optimization just because you think it could be slow.
    But AppDynamics is not a profiler and does not provide means of measuring every tiny method in your code, which sometimes is required for benchmarking alternative implementations.
    That is what i used YourKit, a profiler, for.


Your email address will not be published. Required fields are marked *