Analysis of thread workgroup broadcast for Intel GPUs
暂无分享,去创建一个
As hardware becomes more flexible in terms of programming, software APIs must expose hardware features in a portable way. Thread to thread communication is being exposed in OpenCL 2.0 through the newly defined work-group functions. In this paper we analyze the work-group broadcast functionality in the OpenCL compiler backend for Intel's GPUs. We first describe the particularities of Intel's GEN GPU architecture and the Beignet OpenCL open source project. Then we describe the work-group broadcast implementation which uses shared local memory read/write for thread to thread communication. Finally we analyze the performance and on how the implementation maps to hardware, motivating the design decisions.
[1] David Kaeli,et al. Heterogeneous Computing with OpenCL , 2011 .
[2] Nicholas Wilt,et al. The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .