SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark

The JVM (Java virtual machine) is the cornerstone in most big data frameworks, focusing on automatic memory management and enabling high-productivity languages. Aside from the performance overhead induced by JVM languages (e.g., Java, Scala, etc.), big data frameworks, including Spark, also restrict code execution to general purpose processors (CPUs), while HPC clusters readily include dedicated accelerators for achieving their high performance. In this paper, we analyze the state-of-the-art developments in the field of heterogeneously accelerated Spark, and we propose SparkJNI, a framework for JNI accelerated Spark. The design provides two main components. First, it enables a seamless utilization of native CPU code, in addition to integration of GPU as well as FPGA accelerators. Secondly, SparkJNI enables accelerated execution through native code integration by automatically generating $C++$ code wrappers for easy code development by the programmer. This makes it non-disruptive to the Java programmer, while allowing great flexibility for native code development. Results of running a number of benchmarks show insignificant JNI-induced overhead in access time and bandwidth, with speedups of up to 12x for compute-intensive kernels (such as convolution), in comparison to pure Java Spark implementations. Last, a DNA analysis algorithm (Pair-HMM) is implemented in Spark and integrated with FPGAs, targeting cluster deployments, with benchmark results showing an overall speedup of $\sim 2.7x$ over state-of-the art CPU optimizations. The result of the presented work, along with the SparkJNI framework are publicly available on GitHub for open-source usage and development.

[1]  Vivek Sarkar,et al.  Compiling and Optimizing Java 8 Programs for GPU Execution , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[2]  Zaid Al-Ars,et al.  An Overview of Hardware-Based Acceleration of Biological Sequence Alignment , 2011 .

[3]  H. Peter Hofstee,et al.  SparkGA: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale , 2017, BCB.

[4]  Judy Qiu,et al.  A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures , 2014, 2014 IEEE International Congress on Big Data.

[5]  Moriyoshi Ohara,et al.  A power-efficient FPGA accelerator: Systolic array with cache-coherent interface for pair-HMM algorithm , 2016, 2016 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XIX).

[6]  Jean-François Méhaut,et al.  Performance comparison between Java and JNI for optimal implementation of computational micro-kernels , 2014, ArXiv.

[7]  Yu Cao,et al.  HeteroSpark: A heterogeneous CPU/GPU Spark platform for machine learning algorithms , 2015, 2015 IEEE International Conference on Networking, Architecture and Storage (NAS).

[8]  Hamid Mushtaq,et al.  Cluster-based Apache Spark implementation of the GATK DNA analysis pipeline , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Zaid Al-Ars,et al.  Pushing Big Data into Accelerators: Can the JVM Saturate Our Hardware? , 2017, ISC Workshops.

[10]  Ernst Houtgast,et al.  Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[11]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[12]  Matei A. Zaharia,et al.  An Architecture for and Fast and General Data Processing on Large Clusters , 2016 .

[13]  Martin Margala,et al.  SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters , 2015, ArXiv.

[14]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[15]  Zaid Al-Ars,et al.  Maximizing systolic array efficiency to accelerate the PairHMM Forward Algorithm , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).