Enabling PoCL-based runtime frameworks on the HSA for OpenCL 2.0 support

Abstract The heterogeneous system architecture (HSA), announced by the HSA Foundation, is an approach to integrate central processing unit (CPU) and graphics processing unit (GPU) architectures. The open computing language (OpenCL) is a programming framework that can help utilize heterogeneous architectures. The well-known OpenCL framework, currently in version 1.2, provides programming models for heterogeneous computing. The proposed specifications of OpenCL 2.0 can help utilize HSA features, such as shared virtual memory (SVM). In previous work, we helped enable Portable Computing Language (PoCL)-based OpenCL 1.2 runtime frameworks on the HSA. In this paper, we further extend the PoCL-based runtime on the HSA to support OpenCL 2.0 features. In addition, this is the first work, to our best knowledge, to support PoCL-based OpenCL 2.0 features on HSA. Compared with the widely used OpenCL 1.2, OpenCL 2.0 will support SVM, nested parallelism, pipes, and atomic operations. It can further support parallel design patterns such as tree searches, pointer-based programming and nested parallelism models. Note that PoCL is a widely used open source implementation of OpenCL. Our design flow can help academics to enable OpenCL 2.0 flow on the HSA and benefit further from advanced academic research. The experimental results indicate that our framework provides adequate features to support advanced research.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Yue Wang,et al.  IMPLEMENTING CFD (COMPUTATIONAL FLUID DYNAMICS) IN OPENCL FOR BUILDING SIMULATION , 2011 .

[3]  David R. Kaeli,et al.  A comprehensive performance analysis of HSA and OpenCL 2.0 , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[4]  Bertil Schmidt,et al.  Manycore High-Performance Computing in Bioinformatics , 2011 .

[5]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[6]  Chun-Chieh Yang,et al.  The Support of an Experimental OpenCL Compiler on HSA Environments , 2015 .

[7]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[8]  Xiangyu Li,et al.  Hetero-mark, a benchmark suite for CPU-GPU collaborative computing , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[9]  Jarmo Takala,et al.  pocl: A Performance-Portable OpenCL Implementation , 2014, International Journal of Parallel Programming.

[10]  Rodney A. Kennedy,et al.  A Survey of Medical Image Registration on Multicore and the GPU , 2010, IEEE Signal Processing Magazine.

[11]  Phil Rogers,et al.  Heterogeneous system architecture overview , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[12]  Jenq Kuen Lee,et al.  Support of Probabilistic Pointer Analysis in the SSA Form , 2012, IEEE Transactions on Parallel and Distributed Systems.

[13]  Jenq Kuen Lee,et al.  Vector data flow analysis for SIMD optimizations on OpenCL programs , 2016, Concurr. Comput. Pract. Exp..