Communication generation for data-parallel languages

Data-parallel languages allow programmers to use the familiar machine-independent programming style to develop programs for multiprocessor systems. These languages relieve users of the tedious task of inserting interprocessor communication and delegate this crucial and error-prone task to the compilers for the languages. Since remote access in hierarchical multiprocessor systems is orders of magnitude slower than access to a processor's local memory, interprocessor communication introduces significant overheads to the total execution time. The success of data-parallel languages depends heavily on the compiler's ability to reduce the communication overhead. This dissertation describes novel techniques for communication generation. It covers issues related to communication analysis, placement, and optimization. The techniques have been implemented in the Rice Fortran D95 research compiler--a High Performance Fortran (HPF) compiler--being developed at the Rice University. A major contribution of the dissertation is the development of a data-flow analysis framework for supporting communication placement and optimization in the presence of machine-dependent resource constraints. Examples of resource constraints include in-core memory size, cache size, and the number of physical registers. Communication placement and optimizations that do not take resource constraints into account can lead to incorrect communication placement and/or performance loss. This work also describes how the data-dependence information can be combined with data-flow analysis to improve the scope of some of the well-known communication optimizations. Finally, the dissertation presents communication generation techniques for the cyclic(k) distributions supported by HPF. It presents efficient algorithms for computing the local addresses as well as for generating the communication sets. The innovative techniques described in the dissertation exploit the repetitive pattern exhibited by the cyclic(k) accesses.

[1]  Joel H. Saltz,et al.  Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..

[2]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[3]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[4]  Ken Kennedy,et al.  Efficient address generation for block-cyclic distributions , 1995, ICS '95.

[5]  Tom MacDonald,et al.  High Performance Fortran: A Practical Analysis , 1994, Sci. Program..

[6]  Ken Kennedy,et al.  Combining dependence and data-flow analyses to optimize communication , 1995, Proceedings of 9th International Parallel Processing Symposium.

[7]  David R. O'Hallaron,et al.  Languages, Compilers and Run-Time Systems for Scalable Computers , 1998, Springer US.

[8]  Alexander V. Veidenbaum,et al.  Detecting redundant accesses to array data , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[9]  Ken Kennedy,et al.  Unified compilation of Fortran 77D and 90D , 1993, LOPL.

[10]  Ken Kennedy,et al.  A communication placement framework with unified dependence and data-flow analysis , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[11]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[12]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[13]  Dhananjay M. Dhamdhere A fast algorithm for code movement optimisation , 1988, SIGP.

[14]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  Edith Schonberg,et al.  A Unified Data-Flow Framework for Optimizing Communication , 1994, LCPC.

[16]  Edith Schonberg,et al.  A Framework for Exploiting Data Availability to Opimize Communication , 1993, LCPC.

[17]  Rami G. Melhem,et al.  Compilation Techniques for Optimizing Communication on Distributed-Memory Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[18]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[19]  Ken Kennedy,et al.  Resource-Based Communication Placement Analysis , 1996, LCPC.

[20]  Michael Gerndt,et al.  Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..

[21]  M. Wegman,et al.  Global value numbers and redundant computations , 1988, POPL '88.

[22]  Ulrich Kremer,et al.  Compositional Oil Reservoir Simulation in Fortran D: a Feasibility Study On Intel iPsc/860 , 1994, Int. J. High Perform. Comput. Appl..

[23]  Geoffrey C. Fox,et al.  Compiling Fortran 90D/HPF for Distributed Memory MIMD Computers , 1994, J. Parallel Distributed Comput..

[24]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[25]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[26]  Bernhard Steffen,et al.  Optimal code motion: theory and practice , 1994, TOPL.

[27]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[28]  Ken Kennedy,et al.  A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.

[29]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[30]  Ken Kennedy,et al.  Compiler Analysis for Irregular Problems in Fortran D , 1992, LCPC.

[31]  Vasanth Balasundaram A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor , 1990, J. Parallel Distributed Comput..

[32]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[33]  Charles Koelbel,et al.  Compiling global name-space programs for distributed execution , 1990 .

[34]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[35]  von Hanxledenreinhard D Newsletter #9 Handling Irregular Problems with Fortran D | a Preliminary Report Handling Irregular Problems with Fortran D | a Preliminary Report , 1993 .

[36]  J. Cocke Global common subexpression elimination , 1970, Symposium on Compiler Optimization.

[37]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[38]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[39]  Prithviraj Banerjee,et al.  Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers , 1995, ICS '95.

[40]  Ken Kennedy,et al.  Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..

[41]  Ken Kennedy,et al.  Compilation techniques for block-cyclic distributions , 1994, International Conference on Supercomputing.

[42]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[43]  Ken Kennedy,et al.  Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[44]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[45]  Ken Kennedy,et al.  Analysis of interprocedural side effects in a parallel programming environment , 1988, J. Parallel Distributed Comput..

[46]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[47]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[48]  Peter Brezany,et al.  Processing Array Statements and Procedure Interfaces in the PREPARE High Performance Fortran Compiler , 1994, CC.

[49]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[50]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[51]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[52]  P. Feautrier Parametric integer programming , 1988 .

[53]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[54]  Charles Koelbel Compile-time generation of regular communications patterns , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[55]  Ken Kennedy,et al.  Communication Generation for Cyclic(K) Distributions , 1996 .

[56]  Marina Chen,et al.  Automating the Coordination of Interprocessor Communication , 1990 .

[57]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[58]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[59]  Piyush Mehrotra,et al.  Vienna Fortran—a Fortran language extension for distributed memory multiprocessors , 1992 .

[60]  Ravi Mirchandaney,et al.  Improving the performance of DSM systems via compiler involvement , 1994, Proceedings of Supercomputing '94.

[61]  Reinhard von Hanxleden,et al.  Compiler support for machine-independent parallelization of irregular problems , 1994, Rice COMP TR.

[62]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[63]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[64]  Chau-Wen Tseng,et al.  An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.

[65]  David A. Bell,et al.  Distributed database systems , 1992 .

[66]  Manfred P. Stadel,et al.  A solution to a problem with Morel and Renvoise's “Global optimization by suppression of partial redundancies” , 1988, TOPL.

[67]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[68]  James M. Stichnoth Efficient Compilation of Array Statements for Private Memory Multicomputers , 1993 .

[69]  Hans P. Zima,et al.  Compiling for distributed-memory systems , 1993 .

[70]  Prithviraj Banerjee,et al.  Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers , 1994, IFIP PACT.

[71]  R. van de Geijn,et al.  A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[72]  Michael Gerndt,et al.  Array distribution in SUPERB , 1989, ICS '89.

[73]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[74]  Larry Meadows,et al.  Compiling High Performance Fortran , 1995, PPSC.

[75]  Michael Gerndt,et al.  Automatic parallelization for distributed-memory multiprocessing systems , 1989 .