The design and implementation of a region-based parallel programming language

Programming parallel computers is an extremely challenging task for expert computer programmers, let alone for scientists in other disciplines whose computations often drive the acquisition of such machines. This dissertation describes the design and implementation of ZPL, a programming language created to simplify the task of programming parallel computers. ZPL allows programmers to write code using a global view that describes their algorithms at a high level rather than implementing per-processor behavior. However, unlike other global-view languages, ZPL permits users to reason about the parallel implementation of their code at the syntactic level, allowing them to make informed algorithmic decisions based on the program's parallel implementation. The language feature that supports this duality is called the region. A region is simply a language-level index set that programmers can define, name, and manipulate using high-level operators. Regions constitute a unique means of specifying array computation, serving as an alternative to traditional array indexing and slicing. By distributing each region's indices across a processor set, a parallel interpretation of a ZPL program is achieved. This dissertation studies the impact of the region concept throughout the design and implementation of ZPL. It begins by defining the region concept and its use in the language. It then gives a parallel interpretation of regions, which results in ZPL's syntax-based performance model. ZPL's implementation and runtime libraries are described in detail to show how regions are represented and used at runtime. The design and implementation of a paradigm-neutral interface for efficient portable communication is also described. Finally, two extensions to the basic region concept are given: parameterized regions which can be used to implement hierarchical algorithms such as the multigrid method, and sparse regions which can be used to specify sparse computation over sparse or dense arrays. Throughout the dissertation, regions are evaluated by comparing ZPL programs to other languages in terms of clarity and performance. The conclusion is that regions are a crisp and powerful mechanism for array-based parallel programming.

[1]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[2]  Ton Anh Ngo,et al.  The role of performance models in parallel programming and languages , 1997 .

[3]  E. Blum,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[4]  Robert W. Numrich,et al.  Writing a Multigrid Solver Using Co-array Fortran , 1998, PARA.

[5]  Sung-Eun Choi,et al.  Machine-independent communication optimization , 1999 .

[6]  Peter Brezany,et al.  Vienna Fortran - A Language Specification. Version 1.1 , 1992 .

[7]  Luigi Semenzato,et al.  Arrays in FIDIL , 1991 .

[8]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[9]  Scott B. Baden,et al.  A parallel software infrastructure for dynamic block-irregular scientific calculations , 1995 .

[10]  Lawrence Snyder,et al.  Machine-independent compiler optimizations for collective communication , 1999 .

[11]  Aart J. C. Bik,et al.  Compiler support for sparse matrix computations , 1996 .

[12]  Steven J. Deitz,et al.  Parallel Language Support for Multigrid Algorithms , 2001 .

[13]  Michael A. Epton,et al.  Multipole Translation Theory for the Three-Dimensional Laplace and Helmholtz Equations , 1995, SIAM J. Sci. Comput..

[14]  Brian W. Kernighan,et al.  The C Programming Language , 1978 .

[15]  Robert A. van de Geijn,et al.  SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .

[16]  Arnold R. Krommer Parallel Sparse Matrix Computations in the Industrial Strength PINEAPL Library , 1998, PARA.

[17]  D. C. DiNucci Cooperative Data Sharing: a layered approach to an architecture-independent Message-Passing Interface , 1996, Proceedings. Second MPI Developer's Conference.

[18]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[19]  Bradford L. Chamberlain,et al.  Array language support for parallel sparse computation , 2001, ICS '01.

[20]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[21]  Clemens Grelck,et al.  Shared Memory Multiprocessor Support for SAC , 1998, IFL.

[22]  L. Snyder,et al.  Portable Performance of Data Parallel Languages , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[23]  Michael Frumkin,et al.  Implementation of NAS Parallel Benchmarks in High Performance Fortran , 2000 .

[24]  Aart J. C. Bik,et al.  Compilation techniques for sparse matrix computations , 1993, ICS '93.

[25]  Steven J. Deitz,et al.  Eliminating redundancies in sum-of-product array computations , 2001, ICS '01.

[26]  Marios D. Dikaiakos,et al.  The portable parallel implementation of two novel mathematical biology algorithms in ZPL , 1995, ICS '95.

[27]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[28]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[29]  Wilhelm Gehrke Fortran 95 Language Guide , 1996, Springer London.

[30]  Bradford L. Chamberlain,et al.  Regions: an abstraction for expressing array computation , 1998 .

[31]  Paul Pierce The NX Message Passing Interface , 1994, Parallel Comput..

[32]  Lawrence Snyder,et al.  The implementation and evaluation of fusion and contraction in array languages , 1998, PLDI '98.

[33]  Lawrence Snyder,et al.  A Portable Parallel N-Body Solver , 1995, PPSC.

[34]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language , 1992 .

[35]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[36]  Niklaus Wirth,et al.  Programming in Modula-2 , 1985, Texts and Monographs in Computer Science.

[37]  Henk Sips,et al.  An Implementation Framework for HPF Distributed Arrays on Message Passing Computers , 1996 .

[38]  Robert A. van de Geijn,et al.  Using PLAPACK - parallel linear algebra package , 1997 .

[39]  Thomas R. Gross,et al.  Decoupling synchronization and data transfer in message passing systems of parallel computers , 1995, ICS '95.

[40]  E. Christopher Lewis,et al.  Achieving robust performance in parallel programming languages , 2001 .

[41]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[42]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[43]  Bruce Leasure,et al.  The KAP Parallelizer for DEC Fortran and DEC C Programs , 1994, Digit. Tech. J..

[44]  P. Wesseling An Introduction to Multigrid Methods , 1992 .

[45]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[46]  David Abramson,et al.  Relative debugging for data-parallel programs: a ZPL case study , 2000, IEEE Concurr..

[47]  Sven-Bodo Scholz,et al.  A Case Study: Effects of WITH-Loop-Folding on the NAS Benchmark MG in SAC , 1998, IFL.

[48]  Bradford L. Chamberlain,et al.  Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations , 2001 .

[49]  Steven J. Deitz,et al.  A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[50]  Bradford L. Chamberlain,et al.  ZPL's WYSIWYG performance model , 1998, Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[51]  M. Berger,et al.  Adaptive mesh refinement for hyperbolic partial differential equations , 1982 .

[52]  William H. Sanders,et al.  Using ZPL to develop a parallel chaos router simulator , 1996, Winter Simulation Conference.

[53]  Scott B. Baden,et al.  Efficient Run-Time Support for Irregular Block-Structured Applications , 1998, J. Parallel Distributed Comput..

[54]  J. W. Backus The History of FORTRAN I, II and III , 1979, IEEE Ann. Hist. Comput..

[55]  David Salesin,et al.  Fast Rendering of Complex Environments Using a Spatial Hierarchy , 1996, Graphics Interface.

[56]  Lawrence Snyder,et al.  Experimental Validation of Models of Parallel Computation , 1995, Computer Science Today.

[57]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[58]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[59]  Danny C. Sorensen,et al.  P_ARPACK: An Efficient Portable Large Scale Eigenvalue Package for Distributed Memory Parallel Architectures , 1996, PARA.

[60]  Maurice Yarrow,et al.  New Implementations and Results for the NAS Parallel Benchmarks 2 , 1997, PPSC.

[61]  David C. DiNucci A Simple and Efficient Process and Communication Abstraction for Network Operating Systems , 1997, CANPC.

[62]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[63]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[64]  Zhiyu Shen,et al.  An Empirical Study on Array Subscripts and Data Dependencies , 1989, ICPP.

[65]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[66]  A. Wagner,et al.  Simulation of a cusped bubble rising in a viscoelastic fluid with a new numerical method , 1999, cond-mat/9904029.

[67]  Mark T. Jones,et al.  BlockSolve95 users manual: Scalable library software for the parallel solution of sparse linear systems , 1995 .

[68]  Barbara M. Chapman,et al.  Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation , 1997, IEEE Trans. Parallel Distributed Syst..

[69]  Siddhartha Chatterjee,et al.  Compiling data-parallel programs for efficient execution on shared-memory multiprocessors , 1992 .

[70]  Bradford L. Chamberlain,et al.  Problem space promotion and its evaluation as a technique for efficient parallel computation , 1999, ICS '99.

[71]  Calvin Lin The portability of parallel programs across MIMD computers , 1992 .

[72]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[73]  Paul N. Hilfinger,et al.  FIDIL Reference Manual , 1993 .