Dissecting a CPU/GPU OpenCL Implementation
暂无分享,去创建一个
This chapter shows a very specific mapping of OpenCL to an architectural implementation. It was shown how OpenCL maps slightly differently to a CPU architecture and a GPU architecture. The core principles of this chapter apply to competing CPU and GPU architectures, but significant differences in performance can easily arise from variation in vector width, variations in thread context management, and instruction scheduling. The design of OpenCL is such that the model maps capably to a wide range of architectures, allowing for tuning and acceleration of kernel code. The OpenCL CPU runtime creates a thread to execute on each core of the CPU as a work pool to process OpenCL kernels as they are generated. These threads are passed work by a core management thread for each queue that has the role of removing the first entry from the queue and setting up work for the worker threads.
[1] Bixia Zheng,et al. Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Sigarch,et al. PACT '10 : proceedings of the Nineteenth International Conference on Parallel Architectures and Compilation Techniques : September 11-15, 2010, Vienna, Austria , 2010 .