Application-Driven Development of an Integrated Tool Environment for Distributed-Memory Parallel Processors

The Joint CSCS-ETH/NEC Collaboration in Parallel Processing comprises the development of an integrated tool environment together with applications and algorithms for distributed-memory parallel processors (DMPPs). Tool and application developers interact closely: the requirements of the tools are defined by the needs of the application developers, and once an application requirement becomes an integral part of the tool environment, the tools ease parallelization of similar applications and whole application classes. Additional features of the project are the use of a standardized DMPP high-level programming language (HPF) and low-level message-passing interface (MPI). The tool environment integrates parallelization support, a parallel debugger, and a performance monitor and analyzer. Applicationsalready investigated include unstructured problems. In this paper we summarize the tool and application development efforts and show preliminary performance results of three applications effectively parallelized on two DMPP platforms with the assistance of our tool environment.

[1]  Peter Brezany,et al.  Vienna Fortran - A Language Specification. Version 1.1 , 1992 .

[2]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[3]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[4]  Michael Gerndt,et al.  Intel Paragon XP/S - Architecture, Software Environment, and Performance , 1994 .

[5]  Roland Rühl,et al.  Migration of Vectorized Iterative Solvers to Distributed-Memory Architectures , 1996, SIAM J. Sci. Comput..

[6]  David R. Kincaid,et al.  Algorithm 586: ITPACK 2C: A FORTRAN Package for Solving Large Sparse Linear Systems by Adaptive Accelerated Iterative Methods , 1982, TOMS.

[7]  Doreen Y. Cheng,et al.  A Survey of Parallel Programming Languages and Tools , 2001 .

[8]  Harry Berryman,et al.  Execution time support for adaptive scientific algorithms on distributed memory machines , 1991, Concurr. Pract. Exp..

[9]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[10]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[11]  Roland Rühl A parallelizing compiler for distributed memory parallel processors , 1992 .

[12]  Brian J. N. Wylie,et al.  An Environment for Portable Distributed Memory Parallel Programming , 1994 .

[13]  M. Annaratone,et al.  Interprocessor communication speed and performance in distributed-memory parallel processors , 1989, ISCA '89.

[14]  Michael T. Heath,et al.  ParaGraph: A Tool for Visualizing Performance of Parallel Programs , 2007 .

[15]  Jonathan Richard Shewchuk,et al.  A Compiler for Parallel Finite Element Methods with Domain-Decomposed Unstructured Meshes , 1993 .

[16]  Brian J. N. Wylie,et al.  The "Annai" environment for portable distributed parallel programming , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[17]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[18]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[19]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[20]  Toshiyuki Nakata,et al.  Cenju-3 parallel computer and its application to CFD , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[21]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[22]  Roland Rühl A parallelizing compiler for distributed memory parallel processors , 1992 .

[23]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[24]  Roland Rühl Evaluation of compiler generated parallel programs on three multicomputers , 1992, ICS '92.

[25]  R. Ruhl,et al.  Balancing interprocessor communication and computation on torus-connected multicomputers running compiler-parallelized code , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[26]  Subhash Saini,et al.  NAS Parallel Benchmarks Results 3-95 , 1995 .

[27]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .