HSTREAM: A Directive-Based Language Extension for Heterogeneous Stream Computing

Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver.

[1]  Sunita Chandrasekaran,et al.  Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[2]  Ivona Brandic,et al.  Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review , 2018, Computing.

[3]  N. J. Avis,et al.  An intelligent semi-automatic application porting system for application accelerators , 2009, UCHPC-MAW '09.

[4]  Eric Mizell,et al.  Advances and Applications for Accelerated Computing Introduction to GPUs for Data Analytics , 2017 .

[5]  Sabri Pllana,et al.  Analyzing Large-Scale DNA Sequences on Multi-core Architectures , 2015, 2015 IEEE 18th International Conference on Computational Science and Engineering.

[6]  Terence John Parr,et al.  Enforcing strict model-view separation in template engines , 2004, WWW '04.

[7]  Frank Mueller,et al.  GStream: A General-Purpose Data Streaming Framework on GPU Clusters , 2011, 2011 International Conference on Parallel Processing.

[8]  Hans Werner Meuer,et al.  Top500 Supercomputer Sites , 1997 .

[9]  Bruno Cabral,et al.  ÆminiumGPU: An Intelligent Framework for GPU Programming , 2012, Facing the Multicore-Challenge.

[10]  Christoph W. Kessler,et al.  SkePU 2: Flexible and Type-Safe Skeleton Programming for Heterogeneous Parallel Systems , 2018, International Journal of Parallel Programming.

[11]  Jean-Philippe Martin,et al.  Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[12]  Pawel Czarnul Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors , 2016, International Journal of Parallel Programming.

[13]  Siegfried Benkner,et al.  Explicit Platform Descriptions for Heterogeneous Many-Core Architectures , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[14]  Ciprian Dobre,et al.  Parallel Programming Paradigms and Frameworks in Big Data Era , 2013, International Journal of Parallel Programming.

[15]  José Daniel García Sánchez,et al.  A generic parallel pattern interface for stream and data processing , 2017, Concurr. Comput. Pract. Exp..

[16]  Christoph W. Kessler,et al.  Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption , 2017, ARMS-CC@PODC.

[17]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[18]  Terence Parr,et al.  The Definitive ANTLR 4 Reference , 2013 .

[19]  Andrew Richards,et al.  The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures , 2011, PARCO.

[20]  Peng Zhang,et al.  Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Kirk W. Cameron,et al.  HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[22]  Anders Hast,et al.  Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop , 2009 .

[23]  Albert Cohen,et al.  A stream-computing extension to OpenMP , 2011, HiPEAC.