Merging Compositions of Array Skeletons in SAC

The design of skeletons for expressing concurrent computations usually faces a conflict between software engineering demands and performance issues. Whereas the former favour versatile fine-grain skeletons that can be successively combined into larger programs, coarse-grain skeletons are more desirable from a performance perspective.We describe a way out of this dilemma for array skeletons. In the functional array language SAC we internally represent individual array skeletons by one or more meta skeletons, called WITH-loops. The design of WITH-loops is carefully chosen to be versatile enough to cope with a large variety of skeletons, yet to be simple enough to allow for compilation into efficiently executable (parallel) code. Furthermore, WITH-loops are closed with respect to three tailor-made optimisation techniques, that systematically transform compositions of simple, computationally light-weight skeletons into few complex and computationally heavier-weight skeletons.

[1]  Jingling Xue Aggressive Loop Fusion for Improving Locality and Parallelism , 2005, ISPA.

[2]  Helmut Seidl,et al.  Constraints to stop higher-order deforestation , 1997, POPL '97.

[3]  Kenneth E. Iverson,et al.  The Design of APL , 1973, IBM J. Res. Dev..

[4]  Guy E. Blelloch,et al.  Compiling Collection-Oriented Languages onto Massively Parallel Computers , 1990, J. Parallel Distributed Comput..

[5]  Herbert Kuchen,et al.  A Skeleton Library , 2002, Euro-Par.

[6]  Sergei Gorlatch,et al.  (De) composition rules for parallel scan and reduction , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[7]  Clemens Grelck,et al.  With-Loop Fusion for Data Locality and Parallelism , 2005, IFL.

[8]  John H. G. van Groningen The Implementation and Efficiency of Arrays in Clean 1.1 , 1996, Implementation of Functional Languages.

[9]  Clemens Grelck Implementing the NAS benchmark MG in SAC , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[10]  Sven-Bodo Scholz,et al.  A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC , 1998, IFL.

[11]  Andrew John Gill,et al.  Cheap deforestation for non-strict functional languages , 1996 .

[12]  Robert Bernecky,et al.  The role of APL and J in high-performance computation , 1993, APL '93.

[13]  Sergei Gorlatch,et al.  Skeletons and Transformations in an Integrated Parallel Programming Environment , 1999, PaCT.

[14]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[15]  Sergei Gorlatch,et al.  A Transformational Framework for Skeletal Programs: Overview and Case Study , 1999, IPPS/SPDP Workshops.

[16]  M. A. Jenkins,et al.  Effective data parallel computation using the Psi calculus , 1996, Concurr. Pract. Exp..

[17]  Sjaak Smetsers,et al.  Fusion in Practice , 2002, IFL.

[18]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[19]  Herbert Kuchen,et al.  Optimizing Sequences of Skeleton Calls , 2003, Domain-Specific Program Generation.

[20]  Bradford L. Chamberlain,et al.  The case for high-level parallel programming in ZPL , 1998 .

[21]  Manuel M. T. Chakravarty,et al.  Functional array fusion , 2001, ICFP '01.

[22]  Clemens Grelck,et al.  Towards an Efficient Functional Implementation of the NAS Benchmark FT , 2003, PaCT.

[23]  M. A. Jenkins Q'Nial: A portable interpreter for the nested interactive array language, Nial , 1989, Softw. Pract. Exp..

[24]  S. Gorlatch,et al.  De)Composition for Parallel Scan and Reduction , 1997 .

[25]  Manuel M. T. Chakravarty,et al.  An Approach to Fast Arrays in Haskell , 2002, Advanced Functional Programming.

[26]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[27]  Clemens Grelck,et al.  With-Loop Scalarization - Merging Nested Array Operations , 2003, IFL.

[28]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[29]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[30]  Ken Kennedy,et al.  Loop fusion in high performance Fortran , 1998, ICS '98.

[31]  Wei-Ngan Chin,et al.  Safe fusion of functional expressions II: Further improvements , 1994, Journal of Functional Programming.

[32]  Sven-Bodo Scholz,et al.  WITH-Loop-Folding in SAC - Condensing Consecutive Array Operations , 1997, Implementation of Functional Languages.

[33]  Clemens Grelck,et al.  Shared memory multiprocessor support for functional array processing in SAC , 2005, J. Funct. Program..

[34]  Sergei Gorlatch,et al.  Optimization rules for programming with collective operations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[35]  Sven-Bodo Scholz,et al.  Single Assignment C: efficient support for high-level array operations in a functional setting , 2003, Journal of Functional Programming.

[36]  Paul Hudak,et al.  Compilation of Haskell array comprehensions for scientific computing , 1990, PLDI '90.

[37]  Alexander V. Shafarenko,et al.  Implementing a Numerical Solution of the KPI Equation Using Single Assignment C: Lessons and Experiences , 2005, IFL.

[38]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[39]  Philip Wadler,et al.  Deforestation: Transforming Programs to Eliminate Trees , 1988, Theoretical Computer Science.

[40]  Marco Danelutto,et al.  Skeletons for Data Parallelism in p3l , 1997, Euro-Par.

[41]  Yike Guo,et al.  Parallel skeletons for structured composition , 1995, PPOPP '95.