Iterative Optimization in the Polyhedral Model: One-Dimensional Scheduling case

Emerging micro-processors introduce unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compilers. Because compiler heuristics rely on simplistic performance models, and because they are bound to a limited set of transformations sequences, they only uncover a fraction of the peak performance on typical benchmarks. Iterative optimization is a maturing framework addressing these limitations, but so far, it was not successfully applied complex loop transformation sequences because of the combinatorics of the optimization search space. We focus on the class of loop transformation which can be expressed as one-dimensional affine schedules. We define a systematic exploration method to enumerate the space of all legal, distinct transformations in this class This method is based on an upstream characterization, as opposed to state-of-the-art downstream filtering approaches. Our results demonstrate orders of magnitude improvements in the size of the search space and in the convergence speed of a dedicated iterative optimization heuristic.

[1]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[2]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[3]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[4]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[5]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[6]  Klaus Ecker Scheduling and Automatic Parallelization. Alain Darte, Yves Robert and Frédéric Vivien, Birkhäuser, New York, ISBN 0-8176-4149-1 , 2002 .

[7]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[8]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[9]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[10]  Gilles Villard,et al.  Lattice-Based Memory Allocation , 2005, IEEE Trans. Computers.

[11]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[12]  Patrice Quinton,et al.  The mapping of linear recurrence equations on regular arrays , 1989, J. VLSI Signal Process..

[13]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[14]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[15]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[16]  Albert Cohen,et al.  GRAPHITE: Loop Optimizations Based on the Polyhedral Model for GCC , 2006 .

[17]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[18]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[19]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[20]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[21]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[22]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[23]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[24]  Martin Griebl,et al.  Space–time mapping and tiling: a helpful combination , 2004, Concurr. Comput. Pract. Exp..

[25]  Douglas L. Jones,et al.  Fast and efficient searches for effective optimization-phase sequences , 2005, TACO.

[26]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003 .

[27]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[28]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[29]  Grigori Fursin,et al.  A heuristic search algorithm based on unified transformation framework , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[30]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[31]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[32]  P. Feautrier Parametric integer programming , 1988 .

[33]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[34]  Frédéric Vivien On the optimality of Feautrier's scheduling algorithm , 2003, Concurr. Comput. Pract. Exp..