Simple optimizations for an applicative array language for graphics processors

Graphics processors (GPUs) are highly parallel devices that promise high performance, and they are now flexible enough to be used for general-purpose computing. A programming language based on implicitly data-parallel collective array operations can permit high-level, effective programming of GPUs. I describe three optimizations for such a language: automatic use of GPU shared memory cache, array fusion, and hoisting of nested parallel constructs. These optimizations are simple to implement because of the design of the language to which they are applied but can result in large run-time speedups.

[1]  Thomas Johnsson,et al.  Lambda Lifting: Treansforming Programs to Recursive Equations , 1985, FPCA.

[2]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[3]  Eric Darve,et al.  N-Body simulation on GPUs , 2006, SC.

[4]  Simon L. Peyton Jones,et al.  Regular, shape-polymorphic, parallel arrays in Haskell , 2010, ICFP '10.

[5]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[6]  Michael D. McCool,et al.  Shader algebra , 2004, ACM Trans. Graph..

[7]  Guy E. Blelloch,et al.  Implementation of a portable nested data-parallel language , 1993, PPOPP '93.

[8]  Simon L. Peyton Jones,et al.  A short cut to deforestation , 1993, FPCA '93.

[9]  Gang Chen,et al.  Guarded recursive datatype constructors , 2003, POPL '03.

[10]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[11]  Roman Leshchinskiy,et al.  Stream fusion: from lists to streams to nothing at all , 2007, ICFP '07.

[12]  Anjul Patney,et al.  Efficient computation of sum-products on GPUs through software-managed cache , 2008, ICS '08.

[13]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[14]  Todd L. Veldhuizen,et al.  Arrays in Blitz++ , 1998, ISCOPE.

[15]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[16]  J. Gregory Morrisett,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010, Haskell '10.

[17]  Oege de Moor,et al.  Compiling embedded languages , 2000, Journal of Functional Programming.

[18]  Panagiotis Manolios,et al.  Implementing Survey Propagation on Graphics Processing Units , 2006, SAT.

[19]  James Cheney,et al.  First-Class Phantom Types , 2003 .

[20]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[21]  G. Keller,et al.  GPU Kernels as Data-Parallel Array Computations in Haskell , 2009 .

[22]  Stefan Edelkamp,et al.  Perfect Hashing for State Space Exploration on the GPU , 2010, ICAPS.

[23]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[24]  Frank Pfenning,et al.  Higher-order abstract syntax , 1988, PLDI '88.