Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compilers. Because compiler heuristics rely on simplistic performance models, and because they are bound to a limited set of transformations sequences, they only uncover a fraction of the peak performance on typical benchmarks. Iterative optimization is a maturing framework to address these limitations, but so far, it was not successfully applied complex loop transformation sequences because of the combinatorics of the optimization search space. We focus on the class of loop transformation which can be expressed as one-dimensional affine schedules. We define a systematic exploration method to enumerate the space of all legal, distinct transformations in this class. This method is based on an upstream characterization, as opposed to state-of-the-art downstream filtering approaches. Our results demonstrate orders of magnitude improvements in the size of the search space and in the convergence speed of a dedicated iterative optimization heuristic

[1]  Michael F. P. O'Boyle,et al.  Hybrid Optimizations: Which Optimization Algorithm to Use? , 2006, CC.

[2]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[3]  Grigori Fursin,et al.  A heuristic search algorithm based on unified transformation framework , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[4]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[5]  Kerstin Eder,et al.  International Symposium on Code Generation and Optimization. CGO 2003 , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[6]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[7]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[8]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Martin Griebl,et al.  Space–time mapping and tiling: a helpful combination , 2004, Concurr. Comput. Pract. Exp..

[10]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[11]  Albert Cohen,et al.  Automatic Correction of Loop Transformations , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[12]  Paul Feautrier,et al.  Fuzzy Array Dataflow Analysis , 1997, J. Parallel Distributed Comput..

[13]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[14]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[15]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[16]  Grigori Fursin,et al.  Systematic search within an optimisation space based on Unified Transformation Framework , 2009, Int. J. Comput. Sci. Eng..

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[19]  Albert Cohen,et al.  A Note on the Performance Distribution of Affine Schedules , 2008 .

[20]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[21]  Patrice Quinton,et al.  The mapping of linear recurrence equations on regular arrays , 1989, J. VLSI Signal Process..

[22]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[23]  Jingling Xue Transformations of Nested Loops with Non-Convex Iteration Spaces , 1996, Parallel Comput..

[24]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[25]  William Pugh,et al.  A unifying framework for iteration reordering transformations , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[26]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[27]  Paul Feautrier,et al.  Improving Data Locality by Chunking , 2003, CC.

[28]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[29]  Andy Nisbet,et al.  GAPS: A Compiler Framework for Genetic Algorithm (GA) Optimised Parallelisation , 1998, HPCN Europe.

[30]  P. Feautrier Parametric integer programming , 1988 .

[31]  Frédéric Vivien On the optimality of Feautrier's scheduling algorithm , 2003, Concurr. Comput. Pract. Exp..

[32]  Douglas L. Jones,et al.  Fast and efficient searches for effective optimization-phase sequences , 2005, TACO.

[33]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003 .

[34]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[35]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[36]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[37]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[38]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[39]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[40]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[41]  Marc Le Fur Scanning parameterized polyhedron using Fourier-Motzkin elimination , 1996 .

[42]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[43]  Xin Yuan,et al.  VISTA: VPO interactive system for tuning applications , 2006, TECS.

[44]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[45]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[46]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[47]  Michael F. P. O'Boyle,et al.  Array recovery and high-level transformations for DSP applications , 2003, TECS.

[48]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[49]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[50]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[51]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[52]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[53]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[54]  Sanjay V. Rajopadhye,et al.  Parameterized tiled loops for free , 2007, PLDI '07.

[55]  M. Palkovic,et al.  Enhanced applicability of loop transformations , 2007 .

[56]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[57]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[58]  Uday Bondhugula,et al.  Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.

[59]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[60]  Albert Cohen,et al.  GRAPHITE: Loop Optimizations Based on the Polyhedral Model for GCC , 2006 .

[61]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[62]  Peter M. W. Knijnenburg,et al.  Automatic selection of compiler options using non-parametric inferential statistics , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[63]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[64]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..