The performance and stability of Java applications in production is getting more and more critical. Most (if not all) of todays businesses rely on software. More and more of these applications use web or mobile technologies to directly offer service to customers or integrate with partners. These applications often leverage other applications or services to deliver the expected result. The infrastructure on which these services and application run is virtualized and will move to public or private clouds in the near future.
In a recent article Gunther Dueck from IBM wrote that we can try to romanticize the good old times where everything was easier and better than today, but one day we have to accept SOA, Cloud and Mobile Apps. They are there and will not go away soon.
As “firefighters” for enterprise applications we see also a shift in the problem areas of mainly Java based web and enterprise applications. Some years ago the problems where mainly “code problems” where Hibernate was not configured correctly or a memory leak broke the system down. These issues are still there but we see more and more architectural problems that are a result of the raising complexity and dependency of these applications and higher needs for performance and scalability. An application may handle 5.000 users – but is it designed to handle 100.000? Can it execute a request in 1 second if it aggregates 20 remote service calls? What happens if one service is not available or doesn’t reply in time?
These questions are mostly not answered before the application begins its life in production. And if a problem occurs the application seems to be a big monster or black box and finding the root cause of the problem can be a long and expensive process.
The problem gets even worth with agile development processes where we deploy new applications every 2-4 weeks. Some internet companies like Flickr even deploy more than 10 time a day. So the applications can change every day or week and the half-life period of your knowledge about the application is getting shorter and shorter.
So how can you approach performance or stability issues that occur in production is such an environment? In general a profiler is not the right tool. It has too much overhead, normally works just for a single JVM and is designed for usage in the development environment or the developers workplace. These environments are often just small and locally deployed environments that are not even close to a highly distributed SOA and/or Cloud scenario. In addition to that problems that are a result of high load, concurrency or legacy system can not be easyly reproduced in the development environment.
Classical monitoring or application performance management tools are in most cases not designed to work in highly distributed and loaded environment – so they generally not provide the data we need to analyze the root cause of the problem. In addition to that they normally not respond to change automatically and need a lot of configuration and instrumentation work. (and they cost thousands of dollars per JVM even if you just want to troubleshoot your problem)
AppDynamics is a San Francisco based start-up that developed what they call a “next generation performance management tool”. It is designed from scratch by experienced APM engineers to support exactly these highly distributed SOA/Cloud production environments. AppDynamics can monitor a distributed Java EE application without the need of custom instrumentation or configuration. It automatically creates an application map (see screenshot) which shows the number of calls and response times of any interesting call like database-, webservice- or JMS calls. It also gives you a full call stack down to method level of any interesting business transaction. This is done by an “intelligent” rule base that utilizes statistics of your application to decide which information to show. This means you can get inside information of all of your applications in 15 minutes – this is the time to install the AppDynamics server and agent(s).
You don’t belive this? (I didn’t either when I talked the first time to the AppDynamics team) There is a good news: AppDynamics Lite is a free edition that you can dowload to troubleshoot slow response times, SQLs, erros and stalls in production. The Lite edition has some limits, but will be in most cases sufficient to troubleshoot Java problems in (pre-) production. It is installed in less than 15 minutes – so give it a try and download it for free here: http://www.appdynamics.com/free
Fabian captured some really nice screencasts to demo the features of AppDynamics lite. In his blog you find “hands on” screencast on installation of AppDynamics Lite and how to find production issues.