A Skeletal Parallel Framework with Fusion Optimizer for GPGPU Programming

Although today's graphics processing units (GPUs) have high performance and general-purpose computing on GPUs (GPGPU) is actively studied, developing GPGPU applications remains difficult for two reasons. First, both parallelization and optimization of GPGPU applications is necessary to achieve high performance. Second, the suitability of the target application for GPGPU must be determined, because whether an application performs well with GPGPU heavily depends on its inherent properties, which are not obvious from the source code. To overcome these difficulties, we developed a skeletal parallel programming framework for rapid GPGPU application developments. It enables programmers to easily write GPGPU applications and rapidly test them because it generates programs for both GPUs and CPUs from the same source code. It also provides an optimization mechanism based on fusion transformation. Its effectiveness was confirmed experimentally.

[1]  Zhenjiang Hu,et al.  A New Parallel Skeleton for General Accumulative Computations , 2004, International Journal of Parallel Programming.

[2]  Christoph W. Kessler,et al.  BlockLib: a skeleton library for cell broadband engine , 2008, IWMSE '08.

[3]  Sergei Gorlatch,et al.  TOWARDS PARALLEL PROGRAMMING BY TRANSFORMATION: THE FAN SKELETON FRAMEWORK , 2001, Parallel Algorithms Appl..

[4]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[5]  Philip Wadler,et al.  Deforestation: Transforming Programs to Eliminate Trees , 1990, Theor. Comput. Sci..

[6]  Jens H. Krüger,et al.  GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.

[7]  Rudolf Eigenmann,et al.  OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.

[8]  Herbert Kuchen,et al.  A Skeleton Library , 2002, Euro-Par.

[9]  Brian Campbell,et al.  Amortised Memory Analysis Using the Depth of Data Structures , 2009, ESOP.

[10]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[11]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[12]  Clemens Grelck,et al.  Merging Compositions of Array Skeletons in SAC , 2005, PARCO.

[13]  Zhenjiang Hu,et al.  A library of constructive skeletons for sequential style of parallel programming , 2006, InfoScale '06.

[14]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[15]  Marco Danelutto,et al.  Euro-Par 2004 Parallel Processing , 2004, Lecture Notes in Computer Science.

[16]  Wei-Ngan Chin Safe fusion of functional expressions , 1992, LFP '92.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  William J. Dally,et al.  The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[19]  Zhenjiang Hu,et al.  A Fusion-Embedded Skeleton Library , 2004, Euro-Par.

[20]  Michael F. P. O'Boyle,et al.  Compiler Reduction of Invalidation Traffic in Virtual Shared Memory Systems , 1996, Euro-Par, Vol. I.

[21]  Anne-Marie Kermarrec,et al.  Proceedings of the 13th European international conference on Parallel Processing , 2007 .

[22]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Sven-Bodo Scholz,et al.  Single Assignment C: efficient support for high-level array operations in a functional setting , 2003, Journal of Functional Programming.

[24]  D J Evans,et al.  Parallel processing , 1986 .

[25]  David B. Skillicorn,et al.  The Bird-Meertens Formalism as a Parallel Model , 1993 .

[26]  Stephen Gilmore,et al.  Flexible Skeletal Programming with eSkel , 2005, Euro-Par.

[27]  Masato Takeichi,et al.  Domain-Specific Optimization Strategy for Skeleton Programs , 2007, Euro-Par.

[28]  Masato Takeichi,et al.  An Accumulative Parallel Skeleton for All , 2002, APLAS.

[29]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..

[30]  Simon L. Peyton Jones,et al.  A short cut to deforestation , 1993, FPCA '93.

[31]  Sergei Gorlatch,et al.  Systematic Efficient Parallelization of Scan and Other List Homomorphisms , 1996, Euro-Par, Vol. II.