Advanced Code Generation for High Performance Fortran

For data-parallel languages such as High Performance Fortran to achieve wide acceptance, parallelizing compilers must be able to provide consistently high performance for a broads pectrum of scientific applications. Although compilation of regular data-parallel applications for message-passing systems have been widely studied, current state-of-the-art compilers implement only a small number of key optimizations, and the implementations generally focus on optimizing programs using a "case-based" approach. For these reasons, current compilers are unable to provide consistently high levels of performance. In this paper, we describe techniques developed in the Rice dHPF compiler to address key code generation challenges that arise in achieving high performance for regular applications on message-passing systems. We focus on techniques requiredto implement advanced optimizations and to achieve consistently high performance with existing optimizations. Many of the core communication analysis andco de generation algorithms in dHPF are expressed in terms of abstract equations manipulating integer sets. This approach enables general andy et simple implementations of sophisticated optimizations, making it more practical to include a comprehensive set of optimizations in data-parallel compilers. It also enables the compiler to support much more aggressive computation partitioning algorithms than in previous compilers. We therefore believe this approach can provide higher and more consistent levels of performance than are available today.

[1]  Ken Kennedy,et al.  A linear-time algorithm for computing the memory access sequence in data-parallel programs , 1995, PPOPP '95.

[2]  François Bourdoncle,et al.  Abstract debugging of higher-order imperative languages , 1993, PLDI '93.

[3]  van ReeuwijkKees,et al.  An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems , 1996 .

[4]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  Rudolf Eigenmann,et al.  Demand-Driven, Symbolic Range Propagation , 1995, LCPC.

[6]  Rajeev Barua,et al.  Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors , 1996, LCPC.

[7]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[8]  Derek C. Oppen,et al.  A 2^2^2^pn Upper Bound on the Complexity of Presburger Arithmetic , 1978, J. Comput. Syst. Sci..

[9]  Harold Johnson,et al.  Data flow analysis for `intractable' system software , 1986, SIGPLAN '86.

[10]  Joel H. Saltz,et al.  Languages, compilers and run-time environments for distributed memory machines , 1992 .

[11]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[12]  David A. Padua,et al.  Gated SSA-based demand-driven symbolic analysis for parallelizing compilers , 1995, ICS '95.

[13]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[14]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[15]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[16]  Thomas R. Gross,et al.  Generating Communication for Array Statement: Design, Implementation, and Evaluation , 1994, J. Parallel Distributed Comput..

[17]  William H. Harrison,et al.  Compiler Analysis of the Value Ranges for Variables , 1977, IEEE Transactions on Software Engineering.

[18]  Larry Meadows,et al.  Compiling High Performance Fortran , 1995, PPSC.

[19]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[20]  Samuel P. Midkiff Local Iteration Set Computation for Block-Cyclic Distributions , 1995, ICPP.

[21]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[22]  Vikram S. Adve,et al.  HPF Analysis and Code Generation using Integer Sets , 1997 .

[23]  Chau-Wen Tseng An optimizing Fortran D compiler for MIMD distributed-memory machines , 1993 .

[24]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[25]  Vikram S. Adve,et al.  Simplifying Control Flow in Compiler-Generated Parallel Code , 1997, LCPC.

[26]  Jonathan Harris,et al.  Compiling High Performance Fortran for Distributed-memory Systems , 1995, Digit. Tech. J..

[27]  Martin Charles Golumbic,et al.  Instruction Scheduling Across Control Flow , 1993, Sci. Program..

[28]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[29]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[30]  Henk J. Sips,et al.  An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[31]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[32]  Marina C. Chen,et al.  Compiling Communication-Efficient Programs for Massively Parallel Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[33]  Monica S. Lam,et al.  Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[34]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.

[35]  Reinhard von Hanxleden,et al.  Compiler support for machine-independent parallelization of irregular problems , 1994, Rice COMP TR.

[36]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[37]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[38]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[39]  Jong-Deok Choi,et al.  Global communication analysis and optimization , 1996, PLDI '96.

[40]  Paul Havlak,et al.  Interprocedural symbolic analysis , 1995 .

[41]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[42]  Siegfried Benkner,et al.  Vienna Fortran 90 , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[43]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[44]  Ken Kennedy,et al.  Resource-Based Communication Placement Analysis , 1996, LCPC.

[45]  Michael Gerndt,et al.  Updating Distributed Variables in Local Computations , 1990, Concurr. Pract. Exp..

[46]  Edith Schonberg,et al.  An HPF Compiler for the IBM SP2 , 1995, Proceedings of the IEEE/ACM SC95 Conference.