Efficient memory management for hardware accelerated Java Virtual Machines

Application-specific hardware accelerators can significantly improve a system's performance. In a Java-based system, we then have to consider a hybrid architecture that consists of a Java Virtual Machine running on a general-purpose processor connected to the hardware accelerator. In such a hybrid architecture, data communication between the accelerator and the general-purpose processor can incur a significant cost, which may even annihilate the original performance improvement of adding the accelerator. A careful layout of the data in the memory structure is therefore of major importance to maintain the acceleration performance benefits. This article addresses the reduction of the communication cost in a distributed shared memory consisting of the main memory of the processor and the accelerator's local memory, which are unified in the Java heap. Since memory access times are highly nonuniform, a suitable allocation of objects in either main memory or the accelerator's local memory can significantly reduce the communication cost. We propose several techniques for finding the optimal location for each Java object's data, either statically through profiling or dynamically at runtime. We show how we can reduce communication cost by up to 86% for the SPECjvm and DaCapo benchmarks. We also show that the best strategy is application dependent and also depends on the relative cost of remote versus local accesses. For a relative cost higher than 10, a self-learning dynamic approach often results in the best performance.

[1]  ErnstRolf,et al.  Hardware-Software Cosynthesis for Microcontrollers , 1993 .

[2]  Yvan Saeys,et al.  Scalable hardware accelerator for comparing DNA and protein sequences , 2006, InfoScale '06.

[3]  Dirk Stroobandt,et al.  FPGA-aware garbage collection in Java , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[4]  Dirk Stroobandt,et al.  Transparent Communication between Java and Reconfigurable Hardware , 2004 .

[5]  Stamatis Vassiliadis,et al.  The MOLEN polymorphic processor , 2004, IEEE Transactions on Computers.

[6]  Frank Vahid,et al.  Warp Processors , 2004, ACM Trans. Design Autom. Electr. Syst..

[7]  Peter Lambert,et al.  Scalable, Wavelet-Based Video: From Server to Hardware-Accelerated Client , 2007, IEEE Transactions on Multimedia.

[8]  Dirk Stroobandt,et al.  Mobility of Data in Distributed Hybrid Computing Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Stamatis Vassiliadis,et al.  The Molen compiler for reconfigurable processors , 2007, TECS.

[10]  Brandon Harris,et al.  Accelerator design for protein sequence HMM search , 2006, ICS '06.

[11]  Stamatis Vassiliadis,et al.  Multimedia Execution Hardware Accelerator , 2001, J. VLSI Signal Process..

[12]  Frank Vahid,et al.  Warp Processors , 2006, ACM Trans. Design Autom. Electr. Syst..

[13]  Giovanni De Micheli,et al.  Hardware-software cosynthesis for digital systems , 1993, IEEE Design & Test of Computers.

[14]  Andrew Borg,et al.  A co-design strategy for embedded Java applications based on a hardware interface with invocation semantics , 2006, JTRES '06.

[15]  Luca Benini,et al.  Improving Java performance using dynamic method migration on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[16]  Luigi Carro,et al.  Dynamic reconfiguration with binary translation: breaking the ILP barrier with software compatibility , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[17]  Dirk Stroobandt,et al.  Efficient measurement of data flow enabling communication-aware parallelisation , 2008, IFMT '08.

[18]  Helaihel,et al.  Java as a specification language for hardware-software systems , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[19]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[20]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.