Improving Java performance and energy dissipation through efficient code caching

Traditional Java code generation and instruction fetch path is not efficient, as Java binary code is typically written into the data cache first, and then is loaded into the instruction cache through the shared L2 cache or memory, which takes both time and energy. In this paper, we study three hardware-based code caching strategies, which attempt to write and read the dynamically generated Java code faster and more energy-efficiently. Our experimental results indicate that with proper architectural support, writing code directly into the instruction cache can improve the performance for a variety of Java applications by 9.6% on average, with up to 42.9%. Also, the average energy dissipation of these Java programs can be reduced by 6% with efficient code caching.

[1]  Mahmut T. Kandemir,et al.  Energy-aware compilation and execution in Java-enabled mobile devices , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[2]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[3]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[4]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[6]  Hsien-Hsin S. Lee,et al.  Improving TLB energy for java applications on JVM , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.

[7]  Kathryn S. McKinley,et al.  Dynamic SimpleScalar: Simulating Java Virtual Machines , 2003 .

[8]  장훈,et al.  [서평]「Computer Organization and Design, The Hardware/Software Interface」 , 1997 .

[9]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[10]  Mateo Valero,et al.  Software management of selective and dual data caches , 1997 .

[11]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Narayanan Vijaykrishnan,et al.  Energy-aware code cache management for memory-constrained Java devices , 2003, IEEE International [Systems-on-Chip] SOC Conference, 2003. Proceedings..

[14]  James E. Smith,et al.  Exploring code cache eviction granularities in dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[15]  David A. Patterson,et al.  Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) , 2008 .

[16]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[17]  Vivek Sarkar,et al.  The Jalape ~ no Dynamic Optimizing Compiler for Java TM , 1999 .

[18]  Nadia Tawbi,et al.  E-Bunny: A Dynamic Compiler for Embedded Java Virtual Machines , 2005, J. Object Technol..

[19]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[20]  V. Milutinovic,et al.  A new cache architecture concept: the split temporal/spatial cache , 1996, Proceedings of 8th Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications (MELECON 96).

[21]  Kim Hazelwood,et al.  Generational cache management of code traces in dynamic optimization systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[22]  Edward S. Davidson,et al.  Reducing conflicts in direct-mapped caches with a temporality-based design , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.