Techniques for Compiling Programs on Distributed Memory Multicomputers

Abstract It is widely accepted that distributed memory parallel computers will play an important role in solving computation-intensive problems. However, the design of an algorithm in a distributed memory system is time-consuming and error-prone, because a programmer is forced to manage both parallelism and communication. In this paper, we present techniques for compiling programs on distributed memory parallel computers. We will study the storage management of data arrays and the execution schedule arrangement of Do-loop programs on distributed memory parallel computers. First, we introduce formulas for representing data distribution of specific data arrays across processors. Then, we define communication cost for some message-passing communication operations. Next, we derive a dynamic programming algorithm for data distribution. After that, we show how to improve the communication time by pipelining data, and illustrate how to use data-dependence information for pipelining data. Jacobi's iterative algorithm and the Gauss elimination algorithm for linear systems are used to illustrate our method. We also present experimental results on a 32-node nCUBE-2 computer.

[1]  Ken Kennedy,et al.  Automatic Data Layout for Distributed-Memory Machines in the D Programming Environment , 1994, Automatic Parallelization.

[2]  Manish Gupta,et al.  Compile-time estimation of communication costs on multicomputers , 1992, Proceedings Sixth International Parallel Processing Symposium.

[3]  Hong Xu,et al.  Evaluation of Data Distribution Patterns in Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[4]  Jang-Ping Sheu,et al.  Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers , 1994, IEEE Trans. Parallel Distributed Syst..

[5]  Ulrich Kremer,et al.  NP-completeness of Dynamic Remapping , 1993 .

[6]  Santosh G. Abraham,et al.  Compiling Parallel Loops for High Performance Computers , 1993 .

[7]  PeiZong Lee,et al.  Compiling Efficient Programs for Tightly-Coupled Distributed Memory Computers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[8]  Barbara M. Chapman,et al.  Automatic Support for Data Distribution on Distributed Memory Multiprocessor Systems , 1993, LCPC.

[9]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[10]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.

[11]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[12]  Lionel M. Ni,et al.  A Model for Automatic Data Partitioning , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[13]  Zvi M. Kedem,et al.  Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays , 2017, IEEE Trans. Parallel Distributed Syst..

[14]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1991, LCPC.

[15]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  Ken Kennedy,et al.  Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines , 1992, ICS '92.

[17]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[18]  Hans P. Zima,et al.  Compiling for distributed-memory systems , 1993 .

[19]  Kai Hwang,et al.  Advanced computer architecture - parallelism, scalability, programmability , 1992 .

[20]  John R. Gilbert,et al.  Automatic array alignment in data-parallel programs , 1993, POPL '93.

[21]  Piyush Mehrotra,et al.  Dynamic data distributions in Vienna Fortran , 1993, Supercomputing '93.

[22]  Anne Rogers,et al.  Compiling for Distributed Memory Architectures , 1994, IEEE Trans. Parallel Distributed Syst..

[23]  J. R. Gilbert,et al.  Mobile and replicated alignment of arrays in data-parallel programs , 1993, Supercomputing '93. Proceedings.

[24]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[25]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[26]  Ken Kennedy,et al.  Interactive Parallel Programming using the ParaScope Editor , 1991, IEEE Trans. Parallel Distributed Syst..

[27]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[28]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[29]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[30]  Ping-Sheng Tseng A Systolic Array Parallelizing Compiler , 1990, J. Parallel Distributed Comput..

[31]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[32]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[33]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[34]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..

[35]  Christoph W. Keßler,et al.  Automatic parallelization : new approaches to code generation, data distribution, and performance prediction , 1994 .

[36]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[37]  Ken Kennedy,et al.  An Interactive Environment for Data Partitioning and Distribution , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[38]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[39]  Rami G. Melhem,et al.  Compilation Techniques for Optimizing Communication on Distributed-Memory Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[40]  Jarle Berntsen,et al.  Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..

[41]  Prithviraj Banerjee,et al.  Automating Parallelization of Regular Computations for Distributed-Memory , 1993, ICPP.

[42]  Ken Kennedy,et al.  Automatic Data Layout Using 0-1 Integer Programming , 1994, IFIP PACT.

[43]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[44]  Geoffrey C. Fox,et al.  An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[45]  J. Ramanujam,et al.  Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..

[46]  P. Sadayappan,et al.  Communication-Efficient Matrix Multiplication on Hypercubes , 1996, Parallel Comput..

[47]  Mary E. Mace Memory storage patterns in parallel processing , 1987, The Kluwer international series in engineering and computer science.