HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators

Recently there has been increasing interest in supporting execution of Java Virtual Machine (JVM) applications on accelerator architectures, such as GPUs. Unfortunately, there is a large gap between the features of the JVM and those commonly supported by accelerators. Examples of important JVM features include exceptions, dynamic memory allocation, use of arbitrary composite objects, file I/O, and more. Recent work from our research group tackled the first feature in that list, JVM exception semantics[14]. This paper continues along that path by enabling the acceleration of JVM parallel regions that include object references and dynamic memory allocation. The contributions of this work include 1) serialization and deserialization of JVM objects using a format that is compatible with OpenCL accelerators, 2) advanced code generation techniques for converting JVM bytecode to OpenCL kernels when object references and dynamic memory allocation are used, 3) runtime techniques for supporting dynamic memory allocation on OpenCL accelerators, and 4) a novel redundant data movement elimination technique based on inter-parallel-region dataflow analysis using runtime bytecode inspection. Experimental results presented in this paper show performance improvements of up to 18.33× relative to parallel Java Streams for GPU-accelerated parallel regions, even when those regions include object references and dynamic memory allocation. In our evaluation, we fully characterize where accelerators or the JVM see performance wins and point out opportunities for future work.

[1]  Vivek Sarkar,et al.  Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns , 2014, ECOOP.

[2]  Vivek Sarkar,et al.  Habanero-Java library: a Java 8 framework for multicore programming , 2014, PPPJ.

[3]  Vivek Sarkar,et al.  The Eureka Programming Model for Speculative Task Parallelism , 2015, ECOOP.

[4]  Vivek Sarkar,et al.  HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications , 2016, IEEE Transactions on Parallel and Distributed Systems.

[5]  Vivek Sarkar,et al.  Compiler-Driven Data Layout Transformation for Heterogeneous Platforms , 2013, Euro-Par Workshops.

[6]  Vivek Sarkar,et al.  Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection , 2015, PPPJ.

[7]  Michel Steuwer,et al.  A Composable Array Function Interface for Heterogeneous Computing in Java , 2014, ARRAY@PLDI.

[8]  Philip C. Pratt-Szeliga,et al.  Rootbeer: Seamlessly Using GPUs from Java , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[9]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[10]  Sabela Ramos,et al.  Java in the High Performance Computing arena: Research, practice and experience , 2013, Sci. Comput. Program..

[11]  Vivek Sarkar,et al.  Accelerating Habanero-Java programs with OpenCL generation , 2013, PPPJ.

[12]  Vivek Sarkar,et al.  Habanero-Java: the new adventures of old X10 , 2011, PPPJ.

[13]  Vivek Sarkar,et al.  Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs , 2013, LCPC.

[14]  Kevin Skadron,et al.  Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[15]  Fausto Spoto,et al.  Definite Expression Aliasing Analysis for Java Bytecode , 2012, ICTAC.

[16]  Wojciech Zaremba,et al.  JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units , 2012, GPGPU-5.

[17]  David Kaeli,et al.  Introduction to Parallel Programming , 2013 .

[18]  W. B. VanderHeyden,et al.  CartaBlanca—a pure‐Java, component‐based systems simulation tool for coupled nonlinear physics on unstructured grids—an update , 2003, Concurr. Comput. Pract. Exp..