Programming methodology and performance issues for advanced computer architectures

Abstract This paper will describe some recent attempts to construct transportable numerical software for high-performance computers. Restructuring algorithms in terms of simple linear algebra modules is reviewed. This technique has proved very succesful in obtaining a high level of transportability without severe loss of performance on a wide variety of both vector and parallel computers. The use of modules to encapsulate parallelism and reduce the ratio of data movement to floating-point operations has been demonstrably effective for regular problems such as those found in dense linear algebra. In other situations it may be necessary to express explicitly parallel algorithms. We also present a programming methodology that is useful for constructing new parallel algorithms which require sophisticated synchronization at a large grain level. We describe the SCHEDULE package which provides an environment for developing and analyzing explicitly parallel programs in FORTRAN which are portable. This package now includes a preprocessor to achieve complete portability of user level code and also a graphics post processor for performance analysis and debugging. We discuss details of porting both the SCHEDULE package and user code. Examples from linear algebra, and partial differential equations are used to illustrate the utility of this approach.

[1]  Piyush Mehrotra,et al.  The BLAZE language: A parallel language for scientific programming , 1987, Parallel Comput..

[2]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a Fortran environment , 1987, SGNM.

[3]  Jack Dongarra,et al.  SCHEDULE: Tools for developing and analyzing parallel Fortran programs , 1986 .

[4]  V. Klema LINPACK user's guide , 1980 .

[5]  Jack J. Dongarra,et al.  A fully parallel algorithm for the symmetric eigenvalue problem , 1985, PPSC.

[6]  Jack J. Dongarra,et al.  Squeezing the most out of an algorithm in CRAY FORTRAN , 1984, ACM Trans. Math. Softw..

[7]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[8]  Jack Dongarra,et al.  Advanced Architecture Computers , 1989 .

[9]  William Jalby,et al.  Parallel Algorithms on the CEDAR System , 1986, CONPAR.

[10]  J. C. Diaz Calculating the block preconditioner on parallel multivector processors , 1986, SAC '86.

[11]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[12]  Iain S. Duff,et al.  Parallel implementation of multifrontal schemes , 1986, Parallel Comput..

[13]  Robert G. Babb,et al.  Parallel Processing with Large-Grain Data Flow Techniques , 1984, Computer.

[14]  Jack Dongarra,et al.  Linear algebra on high performance computers , 1986 .

[15]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[16]  James C. Browne,et al.  Framework for formulation and analysis of parallel computation structures , 1986, Parallel Comput..

[17]  Ewing L. Lusk,et al.  Implementation of monitors with macros: a programming aid for the HEP and other parallel processors , 1983 .

[18]  Harry F. Jordan HEP architecture, programming and performance , 1985 .

[19]  Jack J. Dongarra,et al.  A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.

[20]  Jack J. Dongarra,et al.  A portable environment for developing parallel FORTRAN programs , 1987, Parallel Comput..