Running parallel bytecode interpreters on heterogeneous hardware

Since the early conception of managed runtime systems with tiered JIT compilation, several research attempts have been made to accelerate the bytecode execution. In this paper, we extend prior attempts by performing an initial analysis of whether heterogeneous hardware accelerators in the form of Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAS) can help towards achieving higher performance during the bytecode interpreter mode. To answer this question, we implemented a simple parallel Java bytecode interpreter written in OpenCL and executed it across a plethora of devices, including GPUs and FPGAs. Our preliminary evaluation shows that under specific workloads, hardware acceleration can yield up to 17x better performance compared to traditional optimized interpreters running on Intel CPUs and up to 214x compared to ARM CPUs.

[1]  John Kubiatowicz,et al.  GPUs as an opportunity for offloading garbage collection , 2012, ISMM '12.

[2]  Martin Margala,et al.  High Level Programming for Heterogeneous Architectures , 2014, ArXiv.

[3]  Wojciech Zaremba,et al.  JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units , 2012, GPGPU-5.

[4]  Martin Schoeberl,et al.  Design and implementation of an efficient stack machine , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[5]  Ahmet Celik,et al.  Design, implementation, and application of GPU-based Java bytecode interpreters , 2019, Proc. ACM Program. Lang..

[6]  Thierry Moreau,et al.  Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference , 2018, ReQuEST@ASPLOS.

[7]  Martin Schoeberl,et al.  A real-time Java chip-multiprocessor , 2010, TECS.

[8]  Rupesh Nasre,et al.  FastCollect: Offloading Generational Garbage Collection to integrated GPUs , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[9]  Delon Levi,et al.  JBits: Java based interface for reconfigurable computing , 1999 .

[10]  Brad L. Hutchings,et al.  JHDL-an HDL for reconfigurable systems , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[11]  John Kubiatowicz,et al.  A Hardware Accelerator for Tracing Garbage Collection , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[12]  Lukas Stadler,et al.  Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation , 2017, VEE.

[13]  Foivos S. Zakkak,et al.  Dynamic application reconfiguration on heterogeneous hardware , 2019, VEE.