Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous devices. Compared with OpenCL 1.2, the new features of OpenCL 2.0 provide developers with better expressive power for programming heterogeneous computing environments. Currently, gem5-gpu, which includes gem5 and GPGPU-Sim, can offer an experimental simulation environment for OpenCL. In gem5-gpu, gem5 only supports CUDA, although GPGPU-Sim can support OpenCL by compiling an OpenCL kernel code to PTX code using real GPU drivers. However, this compilation flow in GPGPU-Sim can only support up to OpenCL 1.2. OpenCL 2.0 provides new features such as workgroup built-in functions, extended atomic built-in functions, and device-side enqueue. To support OpenCL 2.0, the compiler must be extended to enable the compilation of OpenCL 2.0 kernel code to PTX code. In this paper, the proposed compiler is modified from the low level virtual machine (LLVM) compiler to extend such features to enhance the emulator to support OpenCL 2.0. The proposed compiler creates local buffers for each workgroup to enable workgroup built-in functions and adds atomic built-in functions with memory order and memory scope for OpenCL 2.0 in NVPTX. Furthermore, the APIs available in CUDA are utilized to implement the OpenCL 2.0 device-side enqueue kernel and compilation schemes in Clang are revised. The AMD APP SDK 3.0 and NTU OpenCL benchmarks are used to verify that the proposed compiler can support the features of OpenCL 2.0.
[1]
Somayeh Sardashti,et al.
The gem5 simulator
,
2011,
CARN.
[2]
Chun-Chieh Yang,et al.
OpenCL 2.0 Compiler Adaptation on LLVM for PTX Simulators
,
2017,
2017 46th International Conference on Parallel Processing Workshops (ICPPW).
[3]
Henry Wong,et al.
Analyzing CUDA workloads using a detailed GPU simulator
,
2009,
2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[4]
David A. Wood,et al.
gem5-gpu: A Heterogeneous CPU-GPU Simulator
,
2015,
IEEE Computer Architecture Letters.
[5]
Li Wang,et al.
Analyzing OpenCL 2.0 workloads using a heterogeneous CPU-GPU simulator
,
2017,
2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[6]
Vikram S. Adve,et al.
LLVM: a compilation framework for lifelong program analysis & transformation
,
2004,
International Symposium on Code Generation and Optimization, 2004. CGO 2004..