A Unified Framework for Optimizing Communication in Data-Parallel Programs

This paper presents a framework, based on global array data-flow analysis, to reduce communication costs in a program being compiled for a distributed memory machine. We introduce available section descriptor, a novel representation of communication involving array sections. This representation allows us to apply techniques for partial redundancy elimination to obtain powerful communication optimizations. With a single framework, we are able to capture optimizations like (1) vectorizing communication, (2) eliminating communication that is redundant on any control flow path, (3) reducing the amount of data being communicated, (4) reducing the number of processors to which data must be communicated, and (5) moving communication earlier to hide latency, and to subsume previous communication. We show that the bidirectional problem of eliminating partial redundancies can be decomposed into simpler unidirectional problems even in the context of an array section representation, which makes the analysis procedure more efficient. We present results from a preliminary implementation of this framework, which are extremely encouraging, and demonstrate the effectiveness of this analysis in improving the performance of programs.

[1]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[2]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.

[3]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[4]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[5]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[6]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[7]  Dhananjay M. Dhamdhere,et al.  A composite hoisting-strength reduction transformation for global program optimization part ii , 1982 .

[8]  Robert E. Tarjan,et al.  Testing flow graph reducibility , 1973, J. Comput. Syst. Sci..

[9]  Philip J. Hatcher,et al.  Data-parallel programming on multicomputers , 1990, IEEE Software.

[10]  Ken Kennedy,et al.  Incremental dependence analysis , 1990 .

[11]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[12]  Prithviraj Banerjee,et al.  Automating Parallelism of Regular Computations for Distributed-Memory Multicomputers in the Paradigm Compiler , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[13]  Charles Howard Koelbel,et al.  Compiling programs for nonshared memory machines , 1991 .

[14]  Fred C. Chow,et al.  A portable machine-independent global optimizer--design and measurements , 1984 .

[15]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[16]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[17]  Ken Kennedy,et al.  Analysis of interprocedural side effects in a parallel programming environment , 1988, J. Parallel Distributed Comput..

[18]  Ken Kennedy,et al.  Compiler Analysis for Irregular Problems in Fortran D , 1992, LCPC.

[19]  Vasanth Balasundaram A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor , 1990, J. Parallel Distributed Comput..

[20]  Thomas R. Gross,et al.  Structured dataflow analysis for arrays and its use in an optimizing compiler , 1990, Softw. Pract. Exp..

[21]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[22]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[23]  Michael G. Burke An interval-based approach to exhaustive and incremental interprocedural data-flow analysis , 1990, TOPL.

[24]  Dhananjay M. Dhamdhere,et al.  How to analyze large programs efficiently and informatively , 1992, PLDI '92.

[25]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[26]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[27]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[28]  Edith Schonberg,et al.  A Framework for Exploiting Data Availability to Opimize Communication , 1993, LCPC.

[29]  Rami G. Melhem,et al.  Compilation Techniques for Optimizing Communication on Distributed-Memory Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[30]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[31]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[32]  Alexander V. Veidenbaum,et al.  Detecting redundant accesses to array data , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[33]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..