During Attila Szegedis talk about “lessons learned about the JVM” at QCon London, I was surprised that he emphasized the importance of knowing the amount of data you store in memory. It is not common to be concerned about object size in enterprise Java programming, but he gave a good example of what they had to do Twitter.
Recap: Memory Footprint of Data
Question: How much memory does the String “Hello World” consume?
Answer: 62/86 Bytes (32/64 bit Java)!
This breaks down into 8/16 (Object Header for String) + 11 * 2 (characters) + [8/16 (Object Header char Array) + 4 (array length) padded to 16/24] + 4 (Offset) + 4 (Count) + 4 (HashCode) + 4/8 (Reference to char Array). [On 64Bit the size of String Object is padded to 40].
The Problem
Imagine you have a lot of Locations attached to tweets in your data store. The implementation of the location as a Java class could look like this
class Location { String city; String region; String countryCode; double long; double lat; }
So if you load all the locations of tweets ever made, it is quite obvious that you load a lot of String objects, and at the scale Twitter has, there are for sure a lot of duplicate Strings. Attila said that this data did not fit into a 32 GB heap. So the question is: can we reduce memory consumption, so that all Locations fit into memory?
(read more…)
category:

English
Deutsch 