Optimizations on Array Skeletons in a Shared Memory Environment

Map- and fold-like skeletons are a suitable abstractions to guide parallel program execution in functional array processing. However, when it comes to achieving high performance, it turns out that confining compilation efforts to individual skeletons is insufficient. This paper proposes compilation schemes which aim at reducing runtime overhead due to communication and synchronization by embedding multiple array skeletons within a so-called spmd meta skeleton. Whereas the meta skeleton exclusively takes responsibility for the organization of parallel program execution, the original array skeletons are focussed to their individual numerical operation. While concrete compilation schemes assume multithreading in a shared memory environment as underlying execution model, ideas can be carried over to other settings straightforwardly. Preliminary performance investigations help to quantify potential benefits.

[1]  Jerrold L. Wagener,et al.  Fortran 90 Handbook: Complete Ansi/Iso Reference , 1992 .

[2]  Kevin Hammond,et al.  Research Directions in Parallel Functional Programming , 1999, Springer London.

[3]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[4]  Bradford L. Chamberlain,et al.  Factor-Join: A Unique Approach to Compiling Array Languages for Parallel Machines , 1996, LCPC.

[5]  S. Gorlatch,et al.  De)Composition for Parallel Scan and Reduction , 1997 .

[6]  Michael F. P. O'Boyle,et al.  Synchronization Minimization in a SPMD Execution Model , 1995, J. Parallel Distributed Comput..

[7]  Clemens Grelck,et al.  Shared Memory Multiprocessor Support for SAC , 1998, IFL.

[8]  Sergei Gorlatch,et al.  (De) composition rules for parallel scan and reduction , 1997, Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228).

[9]  Peter G. Harrison,et al.  Parallel Programming Using Skeleton Functions , 1993, PARLE.

[10]  Sven-Bodo Scholz,et al.  WITH-Loop-Folding in SAC - Condensing Consecutive Array Operations , 1997, Implementation of Functional Languages.

[11]  Hans-Wolfgang Loidl,et al.  Algorithm + strategy = parallelism , 1998, Journal of Functional Programming.

[12]  Bradford L. Chamberlain,et al.  The case for high-level parallel programming in ZPL , 1998 .

[13]  George Horatiu Botorog,et al.  Efficient High-Level Parallel Programming , 1998, Theor. Comput. Sci..

[14]  Fethi A. Rabhi Exploiting parallelism in functional languages: a “paradigm-oriented” approach , 1995 .

[15]  Erik Hagersten,et al.  WildFire: a scalable path for SMPs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[16]  Murray Cole,et al.  Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .

[17]  Philip Wadler,et al.  Deforestation: Transforming Programs to Eliminate Trees , 1988, Theoretical Computer Science.

[18]  Sergei Gorlatch,et al.  A Transformational Framework for Skeletal Programs: Overview and Case Study , 1999, IPPS/SPDP Workshops.

[19]  Sergei Gorlatch,et al.  Optimization rules for programming with collective operations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[20]  Simon L. Peyton Jones,et al.  A short cut to deforestation , 1993, FPCA '93.

[21]  Wei-Ngan Chin Towards an automated tupling strategy , 1993, PEPM '93.

[22]  Clemens Grelck Implicit shared memory multiprocessor support for the functional programming language SAC - single assignment C , 2001 .

[23]  Akihiko Takano,et al.  Tupling calculation eliminates multiple data traversals , 1997, ICFP '97.

[24]  Sven-Bodo Scholz On defining application-specific high-level array operations by means of shape-invariant programming facilities , 1999 .

[25]  Clemens Grelck,et al.  On Code Generation for Multi-generator WITH-Loops in SAC , 1999, IFL.