Declarative Parallel Programming for GPUs

The recent rise in the popularity of Graphics Processing Units (GPUs) has been fueled by software frameworks, such as NVIDIA’s Compute Unified Device Architecture (CUDA) and Khronos Group’s OpenCL that make GPUs available for general purpose computing. However, CUDA and OpenCL are still lowlevel approaches that require users to handle details about data layout and movement across levels of memory hierarchy. We propose a declarative approach to coordinating computation and data movement between CPU and GPU, through a domain-specific language that we called Harlan. Not only does a declarative language obviate the need for the programmer to write low-level error-prone boilerplate code, by raising the abstraction of specifying GPU computation it also allows the compiler to optimize data movement and overlap between CPU and GPU computation. By focusing on the “what”, and not the “how”, of data layout, data movement, and computation scheduling, the language eliminates the sources of many programming errors related to correctness and performance.

[1]  David R. Kaeli,et al.  Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.

[2]  Allen Silver,et al.  Beta , 1975, The SAGE Encyclopedia of Research Design.

[3]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[4]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[5]  K. Mani Chandy,et al.  CC++: A Declarative Concurrent Object Oriented Programming Notation , 1993 .

[6]  Xiaoming Li,et al.  A control-structure splitting optimization for GPGPU , 2009, CF '09.

[7]  Simon L. Peyton Jones,et al.  Harnessing the Multicores: Nested Data Parallelism in Haskell , 2008, FSTTCS.

[8]  Torsten Hoefler,et al.  Kanor - A Declarative Language for Explicit Communication , 2011, PADL.

[9]  Michael Boyer Automated Dynamic Analysis of CUDA Programs , 2008 .

[10]  Henry G. Dietz,et al.  MIMD Interpretation on a GPU , 2009, LCPC.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Manuel M. T. Chakravarty,et al.  Nepal - Nested Data Parallelism in Haskell , 2001, Euro-Par.

[13]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[14]  Ian T. Foster,et al.  Compositional parallel programming languages , 1996, TOPL.

[15]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .