Polly's Polyhedral Scheduling in the Presence of Reductions

The polyhedral model provides a powerful mathematical abstraction to enable eective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a signicant performance increase. We give a precise, dependence based, denition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21◊ on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.

[1]  H. Yu,et al.  An adaptive algorithm selection framework , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[2]  Toshio Nakatani,et al.  Detection and global optimization of reduction operations for distributed parallel machines , 1996, ICS '96.

[3]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[4]  Paul Feautrier,et al.  DETECTION OF SCANS , 2000, Parallel Algorithms Appl..

[5]  Uday Bondhugula,et al.  Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.

[6]  Pierre Jouvelot,et al.  Parallelization by Semantic Detection of Reductions , 1986, ESOP.

[7]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[8]  Masato Takeichi,et al.  Towards automatic parallelization of tree reductions in dynamic programming , 2006, SPAA '06.

[9]  Rudolf Eigenmann,et al.  Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.

[10]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[11]  Pierre Jouvelot,et al.  A unified semantic approach for the vectorization and parallelization of generalized reductions , 1989, ICS '89.

[12]  J. Ramanujam,et al.  A framework for enhancing data reuse via associative reordering , 2014, PLDI.

[13]  Sanjay V. Rajopadhye,et al.  Scan detection and parallelization in "inherently sequential" nested loop programs , 2012, CGO '12.

[14]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[15]  Mary W. Hall,et al.  Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[16]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[17]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[18]  Richard Veras,et al.  When polyhedral transformations meet SIMD code generation , 2013, PLDI.

[19]  Paul Feautrier,et al.  Detection of Recurrences in Sequential Programs with Loops , 1993, PARLE.

[20]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[21]  William Pugh,et al.  Static analysis of upper and lower bounds on dependences and parallelism , 1994, TOPL.

[22]  Gautam Gupta Simplifying reductions , 2006, POPL '06.

[23]  Siau-Cheng Khoo,et al.  PType System: A Featherweight Parallelizability Detector , 2004, APLAS.

[24]  Hideya Iwasaki,et al.  Automatic parallelization via matrix multiplication , 2011, PLDI '11.

[25]  Ron Y. Pinter,et al.  Program optimization and parallelization using idioms , 1991, POPL '91.

[26]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[27]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[28]  Christian Lengauer,et al.  Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..

[29]  Paul Feautrier,et al.  Scheduling reductions , 1994, ICS '94.

[30]  Patrice Quinton,et al.  Scheduling reductions on realistic machines , 2002, SPAA '02.

[31]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.