A global communication optimization technique based on data-flow analysis and linear algebra

Reducing communication overhead is extremely important in distributed-memory message-passing architectures. In this article, we present a technique to improve communication that considers data access patterns of the entire program. Our approach is based on a combination of traditional data-flow analysis and a linear algebra framework, and it works on structured programs with conditional statements and nested loops but without arbitrary goto statements.The distinctive features of the solution are the accuracy in keeping communication set information, support for general alignments and distributions including block-cyclic distribu-tions, and the ability to simulate some of the previous approaches with suitable modifications. We also show how optimizations such as message vectorization, message coalescing, and redundancy elimination are supported by our framework. Experimental results on several benchmarks show that our technique is effective in reducing the number of messages (anaverage of 32% reduction), the volume of the data communicated (an average of 37%reduction), and the execution time (an average of 26% reduction).

[1]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[2]  Monica S. Lam,et al.  Interprocedural Analysis for Parallelization , 1995, LCPC.

[3]  Prithviraj Banerjee,et al.  Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers , 1995, ICS '95.

[4]  Ken Kennedy,et al.  Combining dependence and data-flow analyses to optimize communication , 1995, Proceedings of 9th International Parallel Processing Symposium.

[5]  J. Ramanujam,et al.  Communication Generation for Block-Cyclic Distributions , 1997, Parallel Process. Lett..

[6]  Alexander V. Veidenbaum,et al.  Detecting redundant accesses to array data , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[7]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[8]  Charles Koelbel,et al.  High Performance Fortran Handbook , 1993 .

[9]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[10]  Rami G. Melhem,et al.  An Array Data Flow Analysis Based Communication Optimizer , 1997, LCPC.

[11]  Vikram S. Adve,et al.  An Integer Set Framework for HPF Analysis and Code Generation , 1997 .

[12]  Edith Schonberg,et al.  A Unified Data-Flow Framework for Optimizing Communication , 1994, LCPC.

[13]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[14]  Joel H. Saltz,et al.  Interprocedural data flow based optimizations for distributed memory compilation , 1997 .

[15]  Ken Kennedy,et al.  Analysis of interprocedural side effects in a parallel programming environment , 1988, J. Parallel Distributed Comput..

[16]  Vikram S. Adve,et al.  Advanced Code Generation for High Performance Fortran , 2001, Compiler Optimizations for Scalable Parallel Systems Languages.

[17]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[18]  Thomas R. Gross,et al.  Structured dataflow analysis for arrays and its use in an optimizing compiler , 1990, Softw. Pract. Exp..

[19]  John A. Chandy,et al.  Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[20]  Michael Randolph Garey,et al.  Johnson: "computers and intractability , 1979 .

[21]  Shuichi Sakai,et al.  Inter-procedural Analysis for Parallelization of Java Programs , 1999, ACPC.

[22]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[23]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[24]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[25]  Prithviraj Banerjee,et al.  Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers , 1994, IFIP PACT.

[26]  Manish Gupta,et al.  Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers , 1992, IEEE Trans. Parallel Distributed Syst..

[27]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[28]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[29]  Ken Kennedy,et al.  An Interactive Environment for Data Partitioning and Distribution , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[30]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[31]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[32]  Reinhard von Hanxleden,et al.  A Code Placement Framework and its Application to Communication Generation , 1993 .

[33]  Corinne Ancourt,et al.  A Linear Algebra Framework for Static High Performance Fortran Code Distribution , 1997, Sci. Program..

[34]  Ken Kennedy,et al.  Resource-Based Communication Placement Analysis , 1996, LCPC.

[35]  Michael Gerndt,et al.  Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..

[36]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[37]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[38]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[39]  Rami G. Melhem,et al.  Compilation Techniques for Optimizing Communication on Distributed-Memory Systems , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[40]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[41]  John R. Gilbert,et al.  Generating local addresses and communication sets for data-parallel programs , 1993, PPOPP '93.

[42]  Ken Kennedy,et al.  Interprocedural compilation of Fortran D for MIMD distributed-memory machines , 1992, Proceedings Supercomputing '92.

[43]  Ken Kennedy,et al.  Communication Generation for Cyclic(K) Distributions , 1996 .

[44]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[45]  Rami G. Melhem,et al.  Demand-Driven Data Flow Analysis for Communication Optimization , 1997, Parallel Process. Lett..

[46]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[47]  Ken Kennedy,et al.  A communication placement framework with unified dependence and data-flow analysis , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[48]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[49]  Utpal Banerjee Loop Parallelization , 1994, Springer US.

[50]  Robert E. Tarjan Testing flow graph reducibility , 1973, STOC '73.

[51]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[52]  Geoffrey C. Fox,et al.  A Compilation Approach for Fortran 90D/HPF Compilers on Distributed Memory MIMD Computers , 1993 .

[53]  Bernhard Steffen,et al.  Optimal code motion: theory and practice , 1994, TOPL.

[54]  J. Ramanujam,et al.  Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors , 1996, J. Parallel Distributed Comput..