Characterizing the memory behavior of Java workloads: a structured view and opportunities for optimizations

This paper studies the memory behavior of important Java workloads used in benchmarking Java Virtual Machines (JVMs), based on instrumentation of both application and library code in a state-of-the-art JVM, and provides structured information about these workloads to help guide systems' design. We begin by characterizing the inherent memory behavior of the benchmarks, such as information on the breakup of heap accesses among different categories and on the hotness of references to fields and methods. We then provide detailed information about misses in the data TLB and caches, including the distribution of misses over different kinds of accesses and over different methods. In the process, we make interesting discoveries about TLB behavior and limitations of data prefetching schemes discussed in the literature in dealing with pointer-intensive Java codes. Throughout this paper, we develop a set of recommendations to computer architects and compiler writers on how to optimize computer systems and system software to run Java programs more efficiently. This paper also makes the first attempt to compare the characteristics of SPECjvm98 to those of a server-oriented benchmark, pBOB, and explain why the current set of SPECjvm98 benchmarks may not be adequate for a comprehensive and objective evaluation of JVMs and just-in-time (JIT) compilers.We discover that the fraction of accesses to array elements is quite significant, demonstrate that the number of "hot spots" in the benchmarks is small, and show that field reordering cannot yield significant performance gains. We also show that even a fairly large L2 data cache is not effective for many Java benchmarks. We observe that instructions used to prefetch data into the L2 data cache are often squashed because of high TLB miss rates and because the TLB does not usually have the translation information needed to prefetch the data into the L2 data cache. We also find that co-allocation of frequently used method tables can reduce the number of TLB misses and lower the cost of accessing type information block entries in virtual method calls and runtime type checking.

[1]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[2]  Ramesh Radhakrishnan,et al.  Characterization of Java applications at bytecode and ultra-SPARC machine code levels , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[3]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[4]  Alec Wolman,et al.  The structure and performance of interpreters , 1996, ASPLOS VII.

[5]  C MowryTodd,et al.  Compiler-based prefetching for recursive data structures , 1996 .

[6]  Andrew A. Chien,et al.  An automatic object inlining optimization and its evaluation , 2000, PLDI '00.

[7]  Todd C. Mowry,et al.  Tolerating latency through software-controlled data prefetching , 1994 .

[8]  James R. Larus,et al.  Cache-conscious structure definition , 1999, PLDI '99.

[9]  Bilha Mendelson,et al.  Profile-Directed Restructuring of Operating System Code , 1998, IBM Syst. J..

[10]  John C. Gyllenhaal,et al.  A study of the cache and branch performance issues with running Java on current hardware platforms , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[11]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[12]  Brian N. Bershad,et al.  Execution characteristics of desktop applications on Windows NT , 1998, ISCA.

[13]  Jin-Soo Kim,et al.  Memory system behavior of Java programs: methodology and analysis , 2000, SIGMETRICS '00.

[14]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[15]  Alessandro De Gloria,et al.  Ultrasparc Instruction Level Characterization of Java Virtual Machine Workload , 1999 .

[16]  Samuel P. Midkiff,et al.  Quicksilver: a quasi-static compiler for Java , 2000, OOPSLA '00.

[17]  M ChilimbiTrishul,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998 .

[18]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[19]  R Radhakrishnany,et al.  Execution Characteristics of Just-in-time Compilers , 1999 .

[20]  Narayanan Vijaykrishnan,et al.  Architectural issues in Java runtime systems , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[21]  Narayanan Vijaykrishnan,et al.  Using complete system simulation to characterize SPECjvm98 benchmarks , 2000, ICS '00.

[22]  Amer Diwan,et al.  Type-based alias analysis , 1998, PLDI.

[23]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[24]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[25]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[26]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[27]  James R. Larus,et al.  Using generational garbage collection to implement cache-conscious data placement , 1998, ISMM '98.

[28]  Todd C. Mowry,et al.  Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[29]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[30]  Zarka Cvetanovic,et al.  Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[31]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[32]  David A. Patterson,et al.  Performance characterization of a Quad Pentium Pro SMP using OLTP workloads , 1998, ISCA.

[33]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[34]  Vivek Sarkar,et al.  Jalape~ No | a Compiler-supported Java Tm Virtual Machine for Servers , 1999 .

[35]  Michael H. Kalantar,et al.  Java server benchmarks , 2000, IBM Syst. J..

[36]  D. Kaeli,et al.  Characterizing the SPEC JVM 98 Benchmarks On The Java Virtual Machine , 1998 .

[37]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.