Buffer-Safe and Cost-Driven Communication Optimization

This paper presents a new approach for optimizing communication of data parallel programs. Our techniques are based on unidirectional bit-vector data flow analyses that enable vectorizing, coalescing and aggregating communication, and overlapping communication with computation both within and across loop nests. Previous techniques are based on fixed communication optimization strategies whose quality is very sensitive to changes of machine and problem sizes. Our algorithm is novel in that we carefully examine tradeoffs between enhancing communication latency hiding and reducing the number and volume of messages by systematically evaluating a reasonable set of promising communication placements for a given program covering several (possibly conflicting) communication guiding profit motives. We useP3T, a state-of-the-art performance estimator, to ensure communication buffer safety and to find the best communication placement of all created ones. First results show that our method implies a significant reduction in communication costs and demonstrate the effectiveness of this analysis in improving the performance of programs.

[1]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[2]  Ken Kennedy,et al.  Automatic data layout for distributed-memory machines , 1998, TOPL.

[3]  Thomas Fahringer,et al.  Automatic Performance Prediction of Parallel Programs , 1996, Springer US.

[4]  Ken Kennedy,et al.  Evaluating Compiler Optimizations for Fortran D , 1994, J. Parallel Distributed Comput..

[5]  Thomas Fahringer On estimating the useful work distribution of parallel programs under P3T: a static performance estimator , 1996 .

[6]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[7]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[9]  Thomas Fahringer Compile-Time Estimation of Communication Costs for Data Parallel Programs , 1996, J. Parallel Distributed Comput..

[10]  Thomas Fahringer Toward symbolic performance prediction of parallel programs , 1996, Proceedings of International Conference on Parallel Processing.

[11]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  Matthew Haines,et al.  On the utility of threads for data parallel programming , 1995, ICS '95.

[13]  Manish Gupta,et al.  Compile-time estimation of communication costs on multicomputers , 1992, Proceedings Sixth International Parallel Processing Symposium.

[14]  Bernhard Steffen,et al.  The power of assignment motion , 1995, PLDI '95.

[15]  Ken Kennedy,et al.  Resource-Based Communication Placement Analysis , 1996, LCPC.

[16]  D UllmanJeffrey,et al.  Global Data Flow Analysis and Iterative Algorithms , 1976 .

[17]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[18]  Thomas Fahringer Symbolic expression evaluation to support parallelizing compilers , 1997, PDP.

[19]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[20]  D.A. Reed,et al.  An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[21]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[22]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[23]  Jeffrey D. Ullman,et al.  Global Data Flow Analysis and Iterative Algorithms , 1976, J. ACM.

[24]  Thomas Fahringer Estimating and Optimizing Performance for Parallel Programs , 1995, Computer.

[25]  Bernhard Steffen,et al.  Lazy code motion , 1992, PLDI '92.

[26]  Ken Kennedy,et al.  A communication placement framework with unified dependence and data-flow analysis , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[27]  Thomas Fahringer,et al.  Symbolic evaluation for parallelizing compilers , 1997, ICS '97.

[28]  Edith Schonberg,et al.  A Unified Framework for Optimizing Communication in Data-Parallel Programs , 1996, IEEE Trans. Parallel Distributed Syst..

[29]  Peter Brezany,et al.  Vienna Fortran Compilation System - Version 1.2 - User's Guide , 1996 .