A Quantitative Analysis of Space Waste from Java Strings and its Elimination at Garbage Collection Time

This paper describes a novel approach to reduce the memory consumption of Java programs, by reducing thestring memory waste in the runtime. In recent Java applications, string data occupies a large amount of the heap area. For example, more than 30% of the live heap area is used for string data when WebSphere Application Server with Trade6 is running. By investigating the string data in real Java applications, we found two types of memory waste in typical string implementations in Java. First, there are many String objects which have the same values. Second, there are many unused areas in thechar arrays used to hold the string values. This string memory waste exists as or in live objects, so it cannot not be eliminated by existing garbage collection techniques, which only remove dead objects. Quantitative analysis of Java heap revealed that such waste occupied up to 17% of the live heap area even in real Java applications. To remove the string memory waste, we propose a new “string garbage collection” (StringGC) technique for Java. The StringGC works with a usual garbage collector in a JVM, unifying same-value String objects and removing the unused areas in char arrays. In an IBM production JVM, we implemented a StringGC prototype named “UNITE”, where same-value strings are unified when they are tenured by a generational GC. This prototype was able to eliminate more than 90% of the string memory waste, and the live heap size of real Java applications was reduced by up to 15% without noticeable performance degradation.