Run-Time Scheduling and Execution of Loops on Message Passing Machines

Abstract We examine the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures. Our optimizations are specifically targeted toward loops in which some array references are made through a level of indirection. Unstructured mesh codes and sparse matrix solvers are examplese of programs with kernels of this sort. Experimental data that quantify the performance obtainable using the methods discussed here are included.

[1]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[2]  Barbara G. Ryder,et al.  Static Infinite Wait Anomaly Detection in Polynomial Time , 1990, ICPP.

[3]  J. M. Boyle,et al.  Distributed Data Structures for Scientific Computation , 1987 .

[4]  Tzong-Jer Yang,et al.  A comparison of clustering heuristics for scheduling dags on multiprocessors , 1990 .

[5]  KennedyKathryn,et al.  Analysis and Transformation in the ParaScope Editor , 1991 .

[6]  Tao Yang,et al.  A parallel programming tool for scheduling on distributed memory multiprocessors , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[7]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[8]  Lionel M. Ni,et al.  Multicast in Hypercube Multiprocessors , 1990, J. Parallel Distributed Comput..

[9]  Bobby Schnabel,et al.  An Overview of Dino - A New Language for Numerical Computation on Distributed Memory Multiprocessors , 1987, PPSC.

[10]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[11]  Vivek Sarkar,et al.  Determining average program execution times and their variance , 1989, PLDI '89.

[12]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[13]  Sanjay V. Rajopadhye,et al.  OREGAMI: Software Tools for Mapping Parallel Computations to Parallel Architectures , 1990, ICPP.

[14]  Milind Girkar,et al.  Partitioning programs for parallel execution , 1988, ICS '88.

[15]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  S. P. Kumar,et al.  Solving Linear Algebraic Equations on an MIMD Computer , 1983, JACM.

[17]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[18]  Charles Koelbel,et al.  Supporting shared data structures on distributed memory architectures , 1990, PPOPP '90.

[19]  SaltzJoel,et al.  Run-time scheduling and execution of loops on message passing machines , 1990 .

[20]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[21]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[22]  M. Heath,et al.  Matrix factorization on a hypercube multiprocessor , 1985 .

[23]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[24]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[25]  ShashaDennis,et al.  Efficient and correct execution of parallel programs that share memory , 1988 .

[26]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[27]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[28]  Joel H. Saltz,et al.  Principles for problem aggregation and assignment in medium scale multiprocessors , 1987 .

[29]  Tao Yang,et al.  Static Scheduling of Parallel Programs for Message Passing Architectures , 1992, CONPAR.

[30]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[31]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[32]  Nicholas Carriero,et al.  Linda and Friends , 1986, Computer.

[33]  Roldan Pozo Performance Modeling of Sparse Matrix Methods for Distributed Memory Architectures , 1992, CONPAR.

[34]  Sung Jo Kim A general approach to multiprocessor scheduling , 1988 .

[35]  Philippe Chrétienne,et al.  C.P.M. Scheduling with Small Communication Delays and Task Duplication , 1991, Oper. Res..

[36]  Constantine D. Polychronopoulos,et al.  Parallel programming and compilers , 1988 .

[37]  Peter B. Ladkin,et al.  Compile-time analysis of communicating processes , 1992, ICS '92.

[38]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[39]  Narendra Kannarkar A New Parallel Architecture for Sparse Matrix Computation Based on Finite Projective Geometries , 1991 .

[40]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[41]  M. Cosnard,et al.  Clustering Task Graphs for Message Passing Architectures , 1990 .

[42]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..

[43]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989, CSC '89.

[44]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[46]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[47]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[48]  Piyush Mehrotra,et al.  Compiling High Level Constructs to Distributed Memory Architectures , 1989 .

[49]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[50]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[51]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[52]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[53]  Yves Robert,et al.  Parallel Gaussian elimination on an MIMD computer , 1988, Parallel Comput..

[54]  James C. Browne,et al.  The CODE 2.0 graphical parallel programming language , 1992, ICS '92.

[55]  Michel Cosnard,et al.  Gaussian Elimination on Message Passing Architecture , 1987, ICS.

[56]  Jack Dongarra,et al.  SCHEDULE: Tools for developing and analyzing parallel Fortran programs , 1986 .

[57]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[58]  Alexander V. Veidenbaum,et al.  Stale Data Detection and Coherence Enforcement Using Flow Analysis , 1988, ICPP.

[59]  Tao Yang,et al.  A fast static scheduling algorithm for DAGs on an unbounded number of processors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[60]  T. C. Hu Parallel Sequencing and Assembly Line Problems , 1961 .

[61]  Ping-Sheng Tseng Compiling programs for a linear systolic array , 1990, PLDI '90.

[62]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[63]  David C. Cann,et al.  A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..

[64]  Constantine D. Polychronopoulos,et al.  The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN , 1990 .

[65]  Harold Stone High performance computer architeclure , 1987 .

[66]  J. Ortega Introduction to Parallel and Vector Solution of Linear Systems , 1988, Frontiers of Computer Science.

[67]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[68]  Doreen Cheng,et al.  An evaluation of automatic and interactive parallel programming tools , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[69]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[70]  S. Eisenstat,et al.  An experimental study of methods for parallel preconditioned Krylov methods , 1989, C3P.

[71]  David C. Cann,et al.  Retire Fortran? A debate rekindled , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[72]  Alexandru Nicolau,et al.  A Mapping Strategy for MIMD Computers , 1991, Int. J. High Speed Comput..

[73]  Wilson C. Hsieh,et al.  Automatic generation of DAG parallelism , 1989, PLDI '89.

[74]  Yves Robert,et al.  Data Allocation Strategies for the Gauss and Jordan Algorithms on a Ring of Processors , 1989, Inf. Process. Lett..

[75]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.