Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures

Abstract This paper describes a number of optimizations that can be used to support the efficient execution of irregular problems on distributed memory parallel machines. These primitives (1) coordinate interprocessor data movement, (2) manage the storage of, and access to, copies of off-processor data, (3) minimize interprocessor communication requirements, and (4) support a shared name space. We present a detailed performance and scalability analysis of the communication primitives. This performance and scalability analysis is carried out using a workload generator, kernels from real applications, and a large unstructured adaptive application (the molecular dynamics code CHARMM).

[1]  Xian-He Sun,et al.  Scalability of Parallel Algorithm-Machine Combinations , 1994, IEEE Trans. Parallel Distributed Syst..

[2]  Joel H. Saltz,et al.  Slicing Analysis and Indirect Accesses to Distributed Arrays , 1993, LCPC.

[3]  Harry Berryman,et al.  Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..

[4]  Joel H. Saltz,et al.  Distributed memory compiler methods for irregular problems—data copy reuse and runtime partitioning , 1992 .

[5]  D. Mavriplis Three dimensional unstructured multigrid for the Euler equations , 1991 .

[6]  Harry Berryman,et al.  Execution time support for adaptive scientific algorithms on distributed memory machines , 1991, Concurr. Pract. Exp..

[7]  Joel H. Saltz,et al.  ICASE Report No . 92-12 / iVG / / ff 3 J / ICASE THE DESIGN AND IMPLEMENTATION OF A PARALLEL UNSTRUCTURED EULER SOLVER USING SOFTWARE PRIMITIVES , 2022 .

[8]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[9]  Harry Berryman,et al.  Multiprocessors and run-time compilation , 1991, Concurr. Pract. Exp..

[10]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[11]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[12]  Harry Berryman,et al.  Runtime Compilation Methods for Multicomputers , 1991, ICPP.

[13]  Harry Berryman,et al.  Performance of Hashed Cache Data Migration Schemes on Multicomputers , 1991, J. Parallel Distributed Comput..

[14]  Vipin Kumar,et al.  Analysis of scalability of parallel algorithms and architectures: a survey , 1991, ICS '91.

[15]  Joel H. Saltz,et al.  Parallelizing Molecular Dynamics Codes using the Parti Software Primitives , 1993, PPSC.

[16]  Joel H. Saltz,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93. Proceedings.

[17]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[18]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[19]  Shahid H. Bokhari Communication overhead on the Intel iPSC-860 hypercube , 1990 .

[20]  Ken Kennedy,et al.  Compiler Analysis for Irregular Problems in Fortran D , 1992, LCPC.

[21]  Joel H. Saltz,et al.  The incremental scheduler , 1992 .