GRAPHITE : Polyhedral Analyses and Optimizations for GCC

We present a plan to add loop nest optimizations in GCC based on polyhedral representations of loop nests. We advocate a static analysis approach based on a hierarchy of interchangeable abstractions with solvers that range from the exact solvers such as OMEGA, to faster but less precise solvers based on more coarse abstractions. The intermediate representationGRAPHITE (GIMPLE Represented as Polyhedra with Interchangeable Envelopes), built on GIMPLE and the natural loops, hosts the high level loop transformations. We base this presentation on the WRaP-IT project developed in theAlchemy group at INRIA Futurs and Paris-Sud University, on the PIPS compiler developed at École des mines de Paris, and on a joint work with several members of the static analysis and polyhedral compilation community in France. The main goal of this project is to bring more high level loop optimizations toGCC: loop fusion, tiling, strip mining, etc. Thanks to the WRaP-IT experience, we know that the polyhedral analyzes and transformations are affordable in a production compiler. A second goal of this project is to experiment with compile time reduction versus attainable precision when replacing operations on polyhedra with faster operations on more abstract domains. However, the use of a too coarse representation for computing might also result in an over approximated solution that cannot be used in subsequent computations. There exists a trade off between speed of the computation and the attainable precision that has not yet been analyzed for real world programs.

[1]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[2]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[3]  Paul Feautrier,et al.  Automatic Parallelization of Fortran Programs in the Presence of Procedure Calls , 1986, ESOP.

[4]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[5]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[6]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[7]  P. Feautrier Parametric integer programming , 1988 .

[8]  Kleanthis Psarris,et al.  The I Test: A New Test for Subscript Data Dependence , 1990, ICPP.

[9]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[10]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[11]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[12]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[13]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[14]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[15]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[16]  Chau-Wen Tseng,et al.  The Power Test for Data Dependence , 1992, IEEE Trans. Parallel Distributed Syst..

[17]  Christine Eisenbeis,et al.  A general algorithm for data dependence analysis , 1992, ICS '92.

[18]  H. L. Verge A Note on Chernikova's algorithm , 1992 .

[19]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[20]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[21]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[22]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[23]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[24]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[25]  François Irigoin,et al.  Interprocedural Array Region Analyses , 1996, International Journal of Parallel Programming.

[26]  Martin Griebl,et al.  Generation of Synchronous Code for Automatic Parallelization of while Loops , 1995, Euro-Par.

[27]  Paul Feautrier The Data Parallel Programming Model , 1996, Lecture Notes in Computer Science.

[28]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[29]  Vincent Loechner,et al.  Parametric Analysis of Polyhedral Iteration Spaces , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[30]  Frédéric Vivien,et al.  Detection de parallelisme dans les boucles imbriquees , 1997 .

[31]  Paul Feautrier,et al.  Fuzzy Array Dataflow Analysis , 1997, J. Parallel Distributed Comput..

[32]  Denis Barthou,et al.  Array Dataflow Analysis in Presence of Non-affine Constraints , 1998 .

[33]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[34]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[35]  Albert Cohen Program Analysis and Transformation: From the Polytope Model to Formal Languages. (Analyse et transformation de programmes: du modèle polyédrique aux langages formels) , 1999 .

[36]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[37]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[38]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[39]  Antoine Miné,et al.  The octagon abstract domain , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[40]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[41]  Roberto Bagnara,et al.  Possibly Not Closed Convex Polyhedra and the Parma Polyhedra Library , 2002, SAS.

[42]  Jean-Francois Collard,et al.  Reasoning About Program Transformations , 2002, Springer New York.

[43]  Paul Feautrier,et al.  Improving Data Locality by Chunking , 2003, CC.

[44]  Albert Cohen,et al.  A Polyhedral Approach to Ease the Composition of Program Transformations , 2004, Euro-Par.

[45]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[46]  Albert Cohen,et al.  Maximal Static Expansion , 1998, POPL '98.

[47]  Martin Griebl,et al.  Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .

[48]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[49]  David Parello,et al.  Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[50]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[51]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[52]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[53]  Keshav Pingali,et al.  Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.

[54]  Daniel Berlin High-Level Loop Optimizations for GCC , 2004 .

[55]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[56]  Maurice Bruynooghe,et al.  Computation and manipulation of enumerators of integer projections of parametric polytopes , 2005 .

[57]  Michael Wolfe,et al.  Data dependence and its application to parallel processing , 2005, International Journal of Parallel Programming.

[58]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[59]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[60]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[61]  Jean-Francois Collard,et al.  Automatic parallelization ofwhile-loops using speculative execution , 1995, International Journal of Parallel Programming.

[62]  Keshav Pingali,et al.  A singular loop transformation framework based on non-singular matrices , 1992, International Journal of Parallel Programming.