Hybrid Iterative and Model-Driven Optimization in the Polyhedral Model

On modern architectures, a missed optimization can translate into performance degradations reaching orders of magnitude. More than ever, translating Moore's law into actual performance improvements depends on the effectiveness of the compiler. Moreover, missing an optimization and putting the blame on the programmer is not a viable strategy: we must strive for portability of performance or the majority of the software industry will see no benefit in future many-core processors. As a consequence, an optimizing compiler must also be a parallelizing one; it must take care of the memory hierarchy and of (re)partitioning computation to best suit the target architecture Polyhedral compilation is a program optimization and parallelization framework capable of expressing extremely complex transformation sequences. The ability to build and traverse a tractable search space of such transformations remains challenging, and existing model-based heuristics can easily be beaten in identifying profitable parallelism/locality trade-offs. We propose a hybrid iterative and model-driven algorithm for automatic tiling, fusion, distribution and parallelization of programs in the polyhedral model. Our experiments demonstrate the effectiveness of this approach, both in obtaining solid performance improvements over existing auto-parallelizing compilers, and in achieving portability of performance on various modern multi-core architectures.

[1]  Albert Cohen,et al.  A Note on the Performance Distribution of Affine Schedules , 2008 .

[2]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[3]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[4]  Martin Griebl,et al.  Space–time mapping and tiling: a helpful combination , 2004, Concurr. Comput. Pract. Exp..

[5]  Uday Bondhugula,et al.  Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.

[6]  Vivek Sarkar,et al.  Optimal weighted loop fusion for parallel programs , 1997, SPAA '97.

[7]  Kathryn S. McKinley,et al.  A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality , 1997, Comput. J..

[8]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[9]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[10]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[11]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[12]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[13]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[14]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[15]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.

[16]  Ken Kennedy,et al.  Profitable loop fusion and tiling using model-driven empirical search , 2006, ICS '06.

[17]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[18]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[19]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[20]  Albert Cohen,et al.  A Conservative Approach to Handle Full Functions in the Polyhedral Model , 2008 .

[21]  P. Feautrier Parametric integer programming , 1988 .

[22]  Grigori Fursin,et al.  A heuristic search algorithm based on unified transformation framework , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[23]  Alain Darte,et al.  Loop Shifting for Loop Parallelization , 2000 .

[24]  Markus Püschel,et al.  Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.

[25]  Elizabeth R. Jessup,et al.  Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes , 2009, ICCS.

[26]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[27]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[28]  Franz Franchetti,et al.  Formal loop merging for signal transforms , 2005, PLDI '05.

[29]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[30]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.