Evaluating Compiler Optimizations for Fortran D

Abstract The Fortran D compiler uses data decomposition specifications to automatically translate Fortran programs for execution on MIMD distributed-memory machines. This paper introduces and classifies a number of advanced optimizations needed to achieve acceptable performance; they are analyzed and empirically evaluated for stencil computations. Communication optimizations reduce communication overhead by decreasing the number of messages and hide communication overhead by overlapping the cost of remaining messages with local computation. Parallelism optimizations exploit parallel and pipelined computations and may need to restructure the computation to increase parallelism. Profitability formulas are derived for each optimization. Empirical results show that exploiting parallelism for pipelined computations, reductions, and scans is vital. Message vectorization, collective communication, and efficient coarse-grain pipelining also significantly affect performance. Scalability of communication and parallelism optimizations are analyzed. The effectiveness of communication optimizations is dictated by the ratio of communication to computation in the program. An optimization strategy is developed based on these analyses.

[1]  Donald Yeung,et al.  Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient , 1993, PPOPP '93.

[2]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[3]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[4]  John H. Merlin,et al.  ADAPTing Fortran 90 Array Programs for Distributed Memory Architectures , 1991, ACPC.

[5]  P.-S. Tseng,et al.  A parallelizing compiler for distributed memory parallel computers , 1989, PLDI 1989.

[6]  Shahid H. Bokhari,et al.  Complete exchange on the iPSC-860 , 1991 .

[7]  Philip J. Hatcher,et al.  Data-Parallel Programming on MIMD Computers , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  V. K. Naik Scalability issues for a class of CFD applications , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[9]  Ken Kennedy,et al.  An Interactive Environment for Data Partitioning and Distribution , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[10]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[11]  Ken Kennedy,et al.  Compiling Fortran 77D and 90D for MIMD distributed-memory machines , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[12]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[13]  Guy L. Steele,et al.  Fortran at ten gigaflops: the connection machine convolution compiler , 1991, PLDI '91.

[14]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[16]  Roland Rühl Evaluation of compiler generated parallel programs on three multicomputers , 1992, ICS '92.

[17]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[18]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.

[19]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[20]  Ken Kennedy,et al.  Evaluation of compiler optimizations for Fortran D on MIMD distributed memory machines , 1992, ICS '92.

[21]  Ken Kennedy,et al.  An Overview of the Fortran D Programming System , 1991, LCPC.

[22]  Robert B. Schnabel,et al.  Preliminary experience in developing a parallel thin-layer Navier Stokes code and implications for parallel language design , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[23]  Harry Berryman,et al.  Runtime Compilation Methods for Multicomputers , 1991, ICPP.

[24]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[25]  Geoffrey C. Fox,et al.  An Automatic and Symbolic Parallelization System for Distributed Memory Parallel Computers , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[26]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[27]  Robert H. Kuhn,et al.  Low copy message passing on the Alliant CAMPUS/800 , 1992, Proceedings Supercomputing '92.

[28]  Vasanth Balasundaram Translating Control Parallelism to Data Parallelism , 1991, PPSC.

[29]  Robert P. Weaver,et al.  The DINO Parallel Programming Language , 1991, J. Parallel Distributed Comput..

[30]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[31]  Ken Kennedy,et al.  Interprocedural compilation of Fortran D for MIMD distributed-memory machines , 1992, Proceedings Supercomputing '92.

[32]  Ken Kennedy,et al.  Analysis and transformation in an interactive parallel programming tool , 1993, Concurr. Pract. Exp..

[33]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[34]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[35]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[36]  Anthony P. Reeves,et al.  Paragon: A Parallel Programming Environment for Scientific Applicaitons Using Communication Structures , 1992, J. Parallel Distributed Comput..

[37]  Guy E. Blelloch,et al.  Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.

[38]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.

[39]  S StoneHarold,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973 .

[40]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[41]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[42]  Amir Averbuch,et al.  Experience with a Portable Parallelizing Pascal Compiler , 1991, ICPP.

[43]  Michael Gerndt,et al.  Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..