Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation

The polyhedral model for loop parallelization has proved to be an effective tool for advanced optimization and automatic parallelization of programs in higher-level languages. Yet, to integrate such optimizations seamlessly into production compilers, they must be performed on the compiler's internal, low-level, intermediate representation (IR). With Polly, we present an infrastructure for polyhedral optimizations on such an IR. We describe the detection of program parts amenable to a polyhedral optimization (so-called static control parts), their translation to a Z-polyhedral representation, optimizations on this representation and the generation of optimized IR code. Furthermore, we define an interface for connecting external optimizers and present a novel way of using the parallelism they introduce to generate SIMD and OpenMP code. To evaluate Polly, we compile the PolyBench 2.0 benchmarks fully automatically with PLuTo as external optimizer and parallelizer. We can report on significant speedups.

[1]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[2]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[3]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[4]  Patrice Quinton,et al.  High-Level Synthesis of Loops Using the Polyhedral Model , 2008 .

[5]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[6]  J. Ramanujam,et al.  Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.

[7]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[8]  Mahmut T. Kandemir,et al.  A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.

[9]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[10]  William Pugh,et al.  A unifying framework for iteration reordering transformations , 1995, Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing.

[11]  Nicolas Vasilache,et al.  Joint Scheduling and Layout Optimization to Enable Multi-Level Vectorization , 2012 .

[12]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[13]  Uday Bondhugula,et al.  A model for fusion and code motion in an automatic parallelizing compiler , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[15]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[16]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[17]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[18]  Albert Cohen,et al.  GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation , 2010 .

[19]  Robert A. van Engelen,et al.  Efficient Symbolic Analysis for Optimizing Compilers , 2001, CC.

[20]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[21]  Paul S. Wang,et al.  Chains of recurrences—a method to expedite the evaluation of closed-form functions , 1994, ISSAC '94.

[22]  Martin Griebl,et al.  The Loop Parallelizer LooPo-Announcement , 1996, LCPC.

[23]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[24]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[25]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[26]  Robert J. Harrison,et al.  Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[27]  Bennet S. Yee,et al.  Native Client: A Sandbox for Portable, Untrusted x86 Native Code , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[28]  Hongbin Zheng,et al.  Polly – Polyhedral optimization in LLVM , 2012 .

[29]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[30]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.