Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra

Polyhedral compilation has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations remain one important weakness. We address it using sub-polyhedral under-aproximations of the systems of constraints resulting from affine scheduling problems. We propose a sub-polyhedral scheduling technique using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra. This technique relies on simple polynomial time algorithms to under-approximate a general polyhedron into (U)TVPI polyhedra. We modify the state-of-the-art PLuTo compiler using our scheduling technique, and show that for a majority of the Polybench (2.0) kernels, the above under-approximations yield polyhedra that are non-empty. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility.

[1]  Francisco Santos,et al.  A counterexample to the Hirsch conjecture , 2010, ArXiv.

[2]  Alain Darte,et al.  Loop Shifting for Loop Compaction , 2004, International Journal of Parallel Programming.

[3]  Corinne Ancourt,et al.  Minimal Data Dependence Abstractions for Loop Transformations , 1994, LCPC.

[4]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 2004, International Journal of Parallel Programming.

[5]  Eric V. Denardo,et al.  Flows in Networks , 2011 .

[6]  Paul Feautrier,et al.  Scalable and Structured Scheduling , 2006, International Journal of Parallel Programming.

[7]  Bertrand Jeannet,et al.  Apron: A Library of Numerical Abstract Domains for Static Analysis , 2009, CAV.

[8]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[9]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[10]  Patrick Cousot,et al.  A static analyzer for large safety-critical software , 2003, PLDI '03.

[11]  Antoine Miné,et al.  The octagon abstract domain , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[12]  A. Tarski A Decision Method for Elementary Algebra and Geometry , 2023 .

[13]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[14]  Andrew V. Goldberg,et al.  Shortest-path feasibility algorithms: An experimental evaluation , 2008, JEAL.

[15]  Albert Cohen,et al.  GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation , 2010 .

[16]  Shang-Hua Teng,et al.  Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time , 2001, STOC '01.

[17]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[18]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[19]  Michael J. Todd,et al.  The many facets of linear programming , 2002, Math. Program..

[20]  Roberto Bagnara,et al.  Weakly-relational shapes for numeric abstractions: improved algorithms and proofs of correctness , 2009, Formal Methods Syst. Des..

[21]  Ramakrishna Upadrasta,et al.  Potential and Challenges of Two-Variable- Per-Inequality Sub-Polyhedral Compilation , 2011 .

[22]  Patrick Cousot,et al.  Why does Astrée scale up? , 2009, Formal Methods Syst. Des..

[23]  Alain Darte,et al.  Complexity of Multi-dimensional Loop Alignment , 2002, STACS.

[24]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[25]  Robert E. Bixby,et al.  Solving Real-World Linear Programs: A Decade and More of Progress , 2002, Oper. Res..

[26]  Albert Cohen,et al.  A Case for Strongly Polynomial Time Sub-Polyhedral Scheduling Using Two-Variable-Per-Inequality Polyhedra , 2012, HiPEAC 2012.

[27]  V. Pratt Two Easy Theories Whose Combination is Hard , 2002 .

[28]  Martin Griebl,et al.  Forward Communication Only Placements and Their Use for Parallel Program Construction , 2002, LCPC.

[29]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[30]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[31]  T. Lindvall ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.

[32]  Nicolas Halbwachs,et al.  Some ways to reduce the space dimension in polyhedra computations , 2006, Formal Methods Syst. Des..

[33]  Roberto Bagnara,et al.  The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems , 2006, Sci. Comput. Program..

[34]  P. Feautrier Parametric integer programming , 1988 .

[35]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[36]  Robert E. Shostak,et al.  Deciding Linear Inequalities by Computing Loop Residues , 1981, JACM.

[37]  Yves Robert,et al.  Circuit Retiming Applied to Decomposed Software Pipelining , 1998, IEEE Trans. Parallel Distributed Syst..

[38]  Michael J. Maher,et al.  Beyond Finite Domains , 1994, PPCP.

[39]  Joseph Naor,et al.  Simple and Fast Algorithms for Linear and Integer Programs With Two Variables per Inequality , 1994, SIAM J. Comput..

[40]  Frédéric Vivien On the optimality of Feautrier's scheduling algorithm , 2003, Concurr. Comput. Pract. Exp..

[41]  Arne Andersson,et al.  Implementing radixsort , 1998, JEAL.

[42]  Edith Cohen,et al.  Improved algorithms for linear inequalities with two variables per inequality , 1991, STOC '91.

[43]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[44]  K. Subramani,et al.  On Solving Boolean Combinations of UTVPI Constraints , 2007, J. Satisf. Boolean Model. Comput..

[45]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[46]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[47]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[48]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[49]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[50]  Andy King,et al.  The two variable per inequality abstract domain , 2010, High. Order Symb. Comput..

[51]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[52]  Ken Kennedy,et al.  A technique for summarizing data access and its use in parallelism enhancing transformations , 1989, PLDI '89.

[53]  Günter Rote,et al.  Testing the Necklace Condition for Shortest Tours and Optimal Factors in the Plane , 1987, ICALP.

[54]  Bengt Aspvall,et al.  A polynomial time algorithm for solving systems of linear inequalities with two variables per inequality , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[55]  Uday Bondhugula,et al.  Loop transformations: convexity, pruning and optimization , 2011, POPL '11.

[56]  Jeffrey C. Lagarias,et al.  The computational complexity of simultaneous Diophantine approximation problems , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[57]  G. Ziegler Lectures on Polytopes , 1994 .

[58]  Frédéric Vivien,et al.  Minimal enclosing parallelepiped in 3D , 2004, Comput. Geom..

[59]  Kevin D. Wayne,et al.  A polynomial combinatorial algorithm for generalized minimum cost flow , 1999, STOC '99.

[60]  Hongbin Zheng,et al.  Polly – Polyhedral optimization in LLVM , 2012 .