Three Parallel Programming Paradigms: Comparisons on an Archetypal PDE Computation

Three paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation -- a nonlinear, structured-grid partial differential equation boundary value problem -- using the same algorithm on the same hardware. All of the paradigms -- parallel languages represented by the Portland Group's HPF, (semi-)automated serial-to-parallel source-to-source translation represented by CAP-Tools from the University of Greenwich, and parallel libraries represented by Argonne's PETSc -- are found to be easy to use for this problem class, and all are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under any paradigm includes specification of the data partitioning, corresponding to a geometrically simple decomposition of the domain of the PDE. Programming in SPMD style for the PETSc library requires writing only the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global-to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm as a starting point, introduction of concurrency through subdomain blocking (a task similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Programming with CAPTools involves feeding the same sequential implementation to the CAPTools interactive parallelization system, and guiding the source-to-source code transformation by responding to various queries about quantities knowable only at runtime. Results representative of "the state of the practice" for a scaled sequence of structured grid problems are given on three of the most important contemporary high-performance platforms: the IBM SP, the SGI Origin 2000, and the CRAYY T3E.

[1]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Piyush Mehrotra,et al.  Implementation of a Total Variation Diminishing Scheme for the Shock Tube Problem in High Performance Fortran , 1997, PPSC.

[4]  Cos S. Ierotheou,et al.  Computer Aided Parallelization of Unstructured Mesh Codes , 1997, International Conference on Parallel and Distributed Processing Techniques and Applications.

[5]  Martin G. Everett,et al.  Exploitation of Symbolic Information in Interprocedural Dependence Analysis , 1996, Parallel Comput..

[6]  P. F. Leggett CAPTools communications library (CAPLib) version 2.0-022 , 1998 .

[7]  David E. Keyes,et al.  A Perspective On Data-Parallel Implicit Solvers For Mechanics , 1995 .

[8]  William Gropp,et al.  Domain decomposition on parallel computers , 1989, IMPACT Comput. Sci. Eng..

[9]  Piyush Mehrotra,et al.  High Performance Fortran: History, Status and Future , 1998, Parallel Comput..

[10]  David E. Keyes,et al.  On the Interaction of Architecture and Algorithm in the Domain-based Parallelization of an Unstructu , 1997 .

[11]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[12]  Guoliang Xue,et al.  The MINPACK-2 test problem collection , 1992 .

[13]  D. Keyes How Scalable is Domain Decomposition in Practice , 1998 .

[14]  W. K. Anderson,et al.  Achieving High Sustained Performance in an Unstructured Mesh CFD Application , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[15]  Stamatis Vassiliadis,et al.  Parallel Computer Architecture , 2000, Euro-Par.

[16]  David E. Keyes,et al.  A comparison of PETSc library and HPF implementations of an archetypal PDS computation 1 This work w , 1998 .

[17]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[18]  Martin G. Everett,et al.  A Localized Algorithm for Optimizing Unstructured Mesh Partitions , 1995, Int. J. High Perform. Comput. Appl..

[19]  Danesh K. Tafti,et al.  Performance Enhancement on Microprocessors with Hierarchical Memory Systems for Solving Large Sparse Linear Systems , 1999, Int. J. High Perform. Comput. Appl..

[20]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[21]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[22]  David E. Keyes,et al.  A parallelized elliptic solver for reacting flows , 1987 .

[23]  Barbara M. Chapman,et al.  Extending HPF for Advanced Data-Parallel Applications , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[24]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[25]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[26]  Cos S. Ierotheou,et al.  Automatic Parallel Code Generation for Message Passing on Distributed Memory Systems , 1996, Parallel Comput..

[27]  William Gropp,et al.  Parallel Newton-Krylov-Schwarz Algorithms for the Transonic Full Potential Equation , 1996, SIAM J. Sci. Comput..

[28]  Mark Cross,et al.  Automatic Generation of Multi-Dimensionally Partitioned Parallel CFD code in a Parallelisation Tool , 1997, Parallel CFD.

[29]  Cos S. Ierotheou,et al.  Computer Aided Parallelisation Tools (CAPTools) - Conceptual Overview and Performance on the Parallelisation of Structured Mesh Codes , 1996, Parallel Comput..

[30]  Enrico Tronci 1997 , 1997, Les 25 ans de l’OMC: Une rétrospective en photos.

[31]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .