论文信息 - Analysis of OpenCL Work-Group Reduce for Intel GPUs

Analysis of OpenCL Work-Group Reduce for Intel GPUs

As hardware becomes more flexible in terms ofprogramming, software APIs must expose hardware features ina portable way. Additions in the OpenCL 2.0 API expose threadcommunication through the newly defined work-group functions. In this paper we focus on two implementations of the work-groupfunctions in the OpenCL compiler backend for Intel's GPUs. Wefirst describe the particularities of Intel's GEN GPU architectureand the Beignet OpenCL open source project. Both work-groupimplementations are then detailed, one based on thread to threadmessage passing while the other on thread to shared local memoryread/write. The focus is around choosing an optimal variant basedon how each implementation maps to the hardware and its impacton performance.

[1] Nicholas Wilt,et al. The CUDA Handbook: A Comprehensive Guide to GPU Programming , 2013 .

[2] Yon Dohn Chung,et al. Parallel data processing with MapReduce: a survey , 2012, SGMD.

[3] Sanjay Ghemawat,et al. MapReduce: simplified data processing on large clusters , 2008, CACM.

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] David Kaeli,et al. Heterogeneous Computing with OpenCL , 2011 .

[6] Ankush Pramod Deshmukh,et al. Introduction to Hadoop Distributed File System , 2012 .

[7] Mohammed A. S. Khalid,et al. An overview of Altera SDK for OpenCL: A user perspective , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).