Global arrays: A nonuniform memory access programming model for high-performance computers

Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes an approach, called Global Arrays (GAs), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GAs is that they provide a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented the GA library on a variety of computer systems, including the Intel Delta and Paragon, the IBM SP-1 and SP-2 (all message passers), the Kendall Square Research KSR-1/2 and the Convex SPP-1200 (nonuniform access shared-memory machines), the CRAY T3D (a globally addressable distributed-memory computer), and networks of UNIX workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GAs in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.

[1]  K. Mani Chandy,et al.  Fortran M: A Language for Modular Parallel Programming , 1995, J. Parallel Distributed Comput..

[2]  Nicholas Carriero,et al.  How to write parallel programs , 1990 .

[3]  R. R. Oldehoeft,et al.  HEP SISAL: parallel functional programming , 1985 .

[4]  Nicholas Carriero,et al.  How to write parallel programs - a first course , 1990 .

[5]  M. J. Carlton,et al.  Micro benchmark analysis of the KSR1 , 1993, Supercomputing '93.

[6]  Alistair P. Rendell,et al.  Distributed data parallel coupled‐cluster algorithm: Application to the 2‐hydroxypyridine/2‐pyridone tautomerism , 1993, J. Comput. Chem..

[7]  David E. Bernholdt,et al.  Toward high-performance computational chemistry: II. A scalable self-consistent field program , 1996, Journal of Computational Chemistry.

[8]  R. Harrison,et al.  AB Initio Molecular Electronic Structure on Parallel Computers , 1994 .

[9]  Robert J. Harrison,et al.  Portable tools and applications for parallel computers , 1991 .

[10]  David E. Bernholdt,et al.  Orbital‐invariant second‐order many‐body perturbation theory on parallel computers: An approach for large molecules , 1995 .

[11]  Robert J. Harrison,et al.  Parallel direct four-index transformations , 1996 .

[12]  Dirk Grunwald,et al.  Efficient barriers for distributed shared memory computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[13]  Rick Stevens,et al.  Toward high‐performance computational chemistry: II. A scalable self‐consistent field program , 1996 .

[14]  J. Almlöf,et al.  Principles for a direct SCF approach to LICAO–MOab‐initio calculations , 1982 .

[15]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[16]  D. Bernholdt,et al.  Large-scale correlated electronic structure calculations: the RI-MP2 method on parallel computers , 1996 .

[17]  Robert J. Harrison,et al.  The global array programming model for high performance scientific computing , 1995 .

[18]  Robert J. Harrison,et al.  A massively parallel multireference configuration interaction program: The parallel COLUMBUS program , 1997, J. Comput. Chem..

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Forum Mpi MPI: A Message-Passing Interface , 1994 .

[21]  Robert J. Harrison,et al.  Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[22]  Ian T. Foster,et al.  Productive Parallel Programming: The PCN Approach , 1995, Sci. Program..

[23]  Robert J. Harrison,et al.  A parallel implementation of the COLUMBUS multireference configuration interaction program , 1993 .

[24]  A. Szabó,et al.  Modern quantum chemistry : introduction to advanced electronic structure theory , 1982 .

[25]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.