Analysis and reduction of memory inefficiencies in Java strings

This paper describes a novel approach to reduce the memory consumption of Java programs, by focusing on their "string memory inefficiencies". In recent Java applications, string data occupies a large amount of the heap area. For example, about 40% of the live heap area is used for string data when a production J2EE application server is running. By investigating the string data in the live heap, we identified two types of memory inefficiencies -- "duplication" and "unused literals". In the heap, there are many string objects that have the same values. There also exist many string literals whose values are not actually used by the application. Since these inefficiencies exist as live objects, they cannot be eliminated by existing garbage collection techniques, which only remove dead objects. Quantitative analysis of Java heaps in real applications revealed that more than 50% of the string data in the live heap is wasted by these inefficiencies. To reduce the string memory inefficiencies, this paper proposes two techniques at the Java virtual machine level, "StringGC" for eliminating duplicated strings at the time of garbage collection, and "Lazy Body Creation" for delaying part of the literal instantiation until the literal's value is actually used. We also present an interesting technique at the Java program level, which we call "BundleConverter", for preventing unused message literals from being instantiated. Prototype implementations on a production Java virtual machine have achieved about 18% reduction of the live heap in the production application server. The proposed techniques could also reduce the live heap of standard Java benchmarks by 11.6% on average, without noticeable performance degradation.

[1]  Urs Hölzle,et al.  A Study of the Allocation Behavior of the SPECjvm98 Java Benchmark , 1999, ECOOP.

[2]  Frank Tip,et al.  Declarative Object Identity Using Relation Types , 2007, ECOOP.

[3]  Nick Mitchell,et al.  LeakBot: An Automated and Lightweight Tool for Diagnosing Memory Leaks in Large Java Applications , 2003, ECOOP.

[4]  Nikola Grcevski,et al.  Java Just-in-Time Compiler and Virtual Machine Improvements for Server and Middleware Applications , 2004, Virtual Machine Research and Technology Symposium.

[5]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[6]  Kathryn S. McKinley,et al.  Cork: dynamic memory leak detection for garbage-collected languages , 2007, POPL '07.

[7]  Kiyokuni Kawachiya,et al.  A Quantitative Analysis of Space Waste from Java Strings and its Elimination at Garbage Collection Time , 2007 .

[8]  Darko Marinov,et al.  Object equality profiling , 2003, OOPSLA.

[9]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[10]  Patrick Chan,et al.  The Java class libraries , 1998 .

[11]  Frank Yellin,et al.  The Java Virtual Machine Specification , 1996 .

[12]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[13]  James Gosling,et al.  The Java Language Specification, 3rd Edition , 2005 .

[14]  Nick Mitchell,et al.  The causes of bloat, the limits of health , 2007, OOPSLA.

[15]  Guy L. Steele,et al.  Java(TM) Language Specification, The (3rd Edition) (Java (Addison-Wesley)) , 2005 .

[16]  Patrick Chan,et al.  The Java Class Libraries, Second Edition, Volume 2 , 1998 .

[17]  Rafael Dueire Lins,et al.  Garbage collection: algorithms for automatic dynamic memory management , 1996 .

[18]  日本IBMシステムズエンジニアリング株式会社 WebSphere Application Server 開発者ガイド , 2001 .

[19]  Andrew W. Appel,et al.  Hash-consing Garbage Collection , 1993 .

[20]  M. Friedman,et al.  On Programming of Arithmetic Operations , .