Incremental Loop Transformations and Enumeration of Parametric Sets (Incrementele lustransformaties en enumeratie van parametrische verzamelingen)

The geometrical model is a powerful tool for program analysis and optimization and forms the basis on which we build the two parts of this dissertation, a methodology for incremental loop transformations and an efficient enumeration technique for parametric integer sets. Power consumption for typical embedded multi-media applications is dominated by the storage of and the access to the large multi-dimensional arrays they manipulate. It is now well known that a design methodology for reducing power consumption and improving system performance should apply global loop transformations for increasing locality and regularity of data accesses. In the first part of this dissertation, we propose a two-step global loop transformation approach consisting of a linear transformation focusing mainly on regularity, and a translation focusing on locality. We further develop a refined regularity criterion and show how to perform the translation step incrementally, allowing multiple complicated cost functions to be evaluated. Many compiler optimization techniques depend on the enumeration of parametric integer sets defined by linear equations. In the second part of this dissertation, we present the first implementation of Barvinok’s algorithm applied to the enumeration of parametric polytopes, extending an earlier implementation of this algorithm for a subclass of the enumeration problems we consider, and providing a significant improvement over another implementation based on a different technique. The resulting enumerator may be obtained as an explicit function or as a generating function. We further show that these two representations are polynomially interconvertible and we discuss some approaches for handling generalized enumeration problems.

[1]  Ralph E. Gomory,et al.  An algorithm for integer solutions to linear programs , 1958 .

[2]  I. Niven,et al.  An introduction to the theory of numbers , 1961 .

[3]  G. R. Blakley Combinatorial remarks on partitions of a multipartite number , 1964 .

[4]  G. C. Shephard,et al.  Convex Polytopes , 1969, The Mathematical Gazette.

[5]  Yoichi Muraoka,et al.  Parallelism exposure and exploitation in programs , 1971 .

[6]  M. Fischer,et al.  SUPER-EXPONENTIAL COMPLEXITY OF PRESBURGER ARITHMETIC , 1974 .

[7]  E. Ehrhart,et al.  Polynômes arithmétiques et méthode des polyèdres en combinatoire , 1974 .

[8]  Henry C. Thacher,et al.  Applied and Computational Complex Analysis. , 1988 .

[9]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[10]  A. L. Semenov,et al.  Presburgerness of predicates regular in two number systems , 1977 .

[11]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[13]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[14]  Gert Heckman,et al.  Projections of orbits and asymptotic behavior of multiplicities for compact connected Lie groups , 1982 .

[15]  R. Stanley Combinatorics and commutative algebra , 1983 .

[16]  A. W. Kemp,et al.  A treatise on generating functions , 1984 .

[17]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[18]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[19]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[20]  P. Feautrier Array expansion , 1988 .

[21]  Wolfgang Dahmen,et al.  The number of solutions to linear diophantine equations and multivariate splines , 1988 .

[22]  M. Brion Points entiers dans les polyèdres convexes , 1988 .

[23]  P. Feautrier Parametric integer programming , 1988 .

[24]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[25]  Cecilia R. Aragon,et al.  Randomized search trees , 1989, 30th Annual Symposium on Foundations of Computer Science.

[26]  Hugo De Man,et al.  Deriving ASIC architectures for the Hough transform , 1990, Parallel Comput..

[27]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[28]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[29]  Monica S. Lam,et al.  Automatic Blocking by a Compiler , 1991, PPSC.

[30]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[31]  J. Ramanujam A Linear Algebraic View of Loop Transformations and Their Interaction , 1991, PPSC.

[32]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[33]  Guang R. Gao,et al.  Collective Analysis and Transformation of Loop Clusters , 1992 .

[34]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[35]  H.J. De Man,et al.  Automating High Level Control F'low Transformations For Dsp Memory Management , 1992, Workshop on VLSI Signal Processing.

[36]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[37]  Dexter Kozen,et al.  The Design and Analysis of Algorithms , 1991, Texts and Monographs in Computer Science.

[38]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[39]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[40]  Alexander I. Barvinok,et al.  A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[41]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform Loop Nests over Parametric Domains , 1993 .

[42]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[43]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[44]  Wim F. J. Verhaegh,et al.  Allocation of multiport memories for hierarchical data stream , 1993, ICCAD.

[45]  Henri Cohen,et al.  A course in computational algebraic number theory , 1993, Graduate texts in mathematics.

[46]  William Pugh,et al.  Determining schedules based on performance estimation , 1993 .

[47]  William Pugh,et al.  Experiences with Constraint-based Array Dependence Analysis , 1994, PPCP.

[48]  Corinne Ancourt,et al.  Minimal Data Dependence Abstractions for Loop Transformations , 1994, LCPC.

[49]  William Pugh,et al.  Finding Legal Reordering Transformations Using Mappings , 1994, LCPC.

[50]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[51]  H. De Man,et al.  Global communication and memory optimizing transformations for low power signal processing systems , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[52]  A. I. Barvinok,et al.  Computing the Ehrhart polynomial of a convex lattice polytope , 1994, Discret. Comput. Geom..

[53]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[54]  Chau-Wen Tseng,et al.  An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.

[55]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[56]  P. Diaconis,et al.  Rectangular Arrays with Fixed Margins , 1995 .

[57]  H. De Man,et al.  System-level data-flow transformations for power reduction in image and video processing , 1996, Proceedings of Third International Conference on Electronics, Circuits, and Systems.

[58]  Sinai Robins,et al.  The Ehrhart polynomial of a lattice -simplex , 1996 .

[59]  Hugo De Man,et al.  Power exploration for data dominated video applications , 1996, ISLPED '96.

[60]  Aart J. C. Bik,et al.  Compiler support for sparse matrix computations , 1996 .

[61]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[62]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[63]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[64]  Jörg Rambau,et al.  Polyhedral Subdivisions and Projections of Polytopes , 1996 .

[65]  Philippe Clauss,et al.  Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .

[66]  Vincent Loechner Contribution a l'etude des polyedres parametres et applications en parallelisation automatique , 1997 .

[67]  Hugo De Man,et al.  Memory Size Reduction Through Storage Order Optimization for Embedded Parallel Multimedia Applications , 1997, Parallel Comput..

[68]  Martin E. Dyer,et al.  On Barvinok's Algorithm for Counting Lattice Points in Fixed Dimension , 1997, Math. Oper. Res..

[69]  Hugo De Man,et al.  Practical solutions for counting scalars and dependences in ATOMIUM-a memory management system for multidimensional signal processing , 1997, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[70]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[71]  Philippe Clauss,et al.  Handling Memory Cache Policy with Integer Points Counting , 1997, Euro-Par.

[72]  Francky Catthoor,et al.  Fast and extensive system-level memory exploration for ATM applications , 1997, Proceedings. Tenth International Symposium on System Synthesis (Cat. No.97TB100114).

[73]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[74]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[75]  Santosh Pande,et al.  Compiler optimizations for real time execution of loops on limited memory embedded systems , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[76]  Vincent Loechner,et al.  Parametric Analysis of Polyhedral Iteration Spaces , 1998, J. VLSI Signal Process..

[77]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[78]  Hugo De Man,et al.  Formalized methodology for data reuse: exploration for low-power hierarchical memory mappings , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[79]  Hugo De Man,et al.  System-Level Data-Flow Transformation Exploration and Power-Area Trade-offs Demonstrated on Video Codecs , 1998, J. VLSI Signal Process..

[80]  Bernard Boigelot Symbolic Methods for Exploring Infinite State Spaces , 1998 .

[81]  Pierre Boulet,et al.  Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..

[82]  Pierre Boulet,et al.  Communication Pre-evaluation in HPF , 1998, Euro-Par.

[83]  A. Barvinok,et al.  An Algorithmic Theory of Lattice Points in Polyhedra , 1999 .

[84]  Anne Mignotte,et al.  Loop alignment for memory accesses optimization , 1999, Proceedings 12th International Symposium on System Synthesis.

[85]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[86]  Frank Van Eynde,et al.  Semantic interpretation of temporal information by abductive inference , 2000, CLIN.

[87]  Chau-Wen Tseng,et al.  Locality Optimizations for Multi-Level Caches , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[88]  Frank Van Eynde,et al.  The semantics of temporal adjuncts , 1999, CLIN.

[89]  Zhiyuan Li,et al.  New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[90]  Ed F. Deprettere,et al.  Compilation from Matlab to Process Networks , 1999 .

[91]  Vincent Loechner PolyLib: A Library for Manipulating Parameterized Polyhedra , 1999 .

[92]  D. Avis A Revised Implementation of the Reverse Search Vertex Enumeration Algorithm , 2000 .

[93]  Sharad Malik,et al.  Exact memory size estimation for array computations , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[94]  Erik Brockmeyer,et al.  Unified Meta-Flow Summary for Low-Power Data-Dominated Applications , 2000 .

[95]  Hugo De Man,et al.  A preprocessing step for global loop transformations for data transfer optimization , 2000, CASES '00.

[96]  Felix Heine,et al.  Volume Driven Data Distribution for NUMA-Machines , 2000, Euro-Par.

[97]  K. Fukuda Frequently Asked Questions in Polyhedral Computation , 2000 .

[98]  D. Verkest,et al.  Systematic high-level address code transformations for piece-wise linear indexing: illustration on a medical imaging algorithm , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[99]  F. Catthoor,et al.  Analysis of high-level address code transformations for programmable processors , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[100]  Franz Aurenhammer,et al.  Voronoi Diagrams , 2000, Handbook of Computational Geometry.

[101]  Doran Wilde,et al.  A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .

[102]  Alain Darte,et al.  Loop Shifting for Loop Parallelization , 2000 .

[103]  Erik Brockmeyer,et al.  Systematic cycle budget versus system power trade-off: a new perspective on system exploration of real-time data-dominated applications , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[104]  Alain Darte On the Complexity of Loop Fusion , 2000, Parallel Comput..

[105]  Frank Van Eynde,et al.  Abductive reasoning with temporal information , 2000, ArXiv.

[106]  James Gosling,et al.  The Real-Time Specification for Java , 2000, Computer.

[107]  Cheng Wang,et al.  Data locality enhancement by memory reduction , 2001, ICS '01.

[108]  Thierry J-F. Omnès Acropolis : un précompilateur de spécification pour l'exploration du transfert et du stockage des données en conception de systèmes embarqués à Haut Débit , 2001 .

[109]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[110]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[111]  Dimitrios Soudris,et al.  A code transformation-based methodology for improving I-cache performance , 2001, ICECS 2001. 8th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.01EX483).

[112]  Sven Verdoolaege,et al.  A heuristic for improving the regularity of accesses by global loop transformations in the polyhedral model , 2001 .

[113]  Francky Catthoor,et al.  Low power design of turbo decoder module with exploration of energy-performance trade-offs , 2001 .

[114]  K. A. Gallivan,et al.  An efficient algorithm for pointer-to-array access conversion for compiling and optimizing DSP applications , 2001, 2001 Innovative Architecture for Future Generation High-Performance Processors and Systems.

[115]  Per Gunnar Kjeldsberg Storage Requirement Estimation and Optimization for Data Intensive Applications , 2001 .

[116]  Michael Joswig,et al.  Polymake: an approach to modular software design in computational geometry , 2001, SCG '01.

[117]  Siddhartha Chatterjee,et al.  Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.

[118]  A. Barvinok,et al.  Short rational generating functions for lattice point problems , 2002, math/0211146.

[119]  Alexander Barvinok,et al.  A course in convexity , 2002, Graduate studies in mathematics.

[120]  Gerda Janssens,et al.  Geometric Model Checking: An Automatic Verification Technique for Loop and Data Reuse Transformations , 2002, COCV@ETAPS.

[121]  Alexandru Turjan,et al.  A Compile Time Based Approach for Solving Out-of-Order Communication in Kahn Process Networks , 2002, ASAP.

[122]  Alain Darte,et al.  New results on array contraction [memory optimization] , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.

[123]  Heiko Falk,et al.  Control Flow Optimization by Loop Nest Splitting at the Source Code Level , 2002 .

[124]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[125]  Roberto Bagnara,et al.  Possibly Not Closed Convex Polyhedra and the Parma Polyhedra Library , 2002, SAS.

[126]  van T Tycho Meeuwen Data-cache conflict-miss reduction by high-level data-layout transformations , 2002 .

[127]  Erik Brockmeyer,et al.  Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.

[128]  Юрий Рэмович Романовский,et al.  О регулярных триангуляциях невыпуклых многогранников@@@Regular triangulations of non-convex polytopes , 2002 .

[129]  Gerda Janssens,et al.  Feasibility of incremental translation , 2002 .

[130]  Youcef Bouchebaba Optimisation des transferts de données pour le traitement du signal : pavage, fusion et réallocation des tableaux , 2002 .

[131]  Alain Darte,et al.  Complexity of Multi-dimensional Loop Alignment , 2002, STACS.

[132]  Jürgen Teich,et al.  Generation of Distributed Loop Control , 2002, Embedded Processor Design Challenges.

[133]  Nicolas Halbwachs,et al.  Cartesian Factoring of Polyhedra in Linear Relation Analysis , 2003, SAS.

[134]  Francky Catthoor,et al.  An access regularity criterion and regularity improvement heuristics for data transfer optimization by global loop transformations , 2003 .

[135]  J. Settleman,et al.  A memory GAP , 2003, Trends in Neurosciences.

[136]  Michael F. P. O'Boyle,et al.  Array recovery and high-level transformations for DSP applications , 2003, TECS.

[137]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[138]  Björn Lisper,et al.  Fully Automatic, Parametric Worst-Case Execution Time Analysis , 2003, WCET.

[139]  Erik Brockmeyer,et al.  Layer assignment techniques for low energy in multi-layered memory organisations , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[140]  Rudy Lauwereins,et al.  Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications , 2003, TODE.

[141]  Henk Corporaal,et al.  A step toward a scalable dynamic single assignment conversion , 2003 .

[142]  P. Marwedel,et al.  Control flow driven splitting of loop nests at the source code level , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[143]  R. Leupers,et al.  High-level Control Flow Transformations for Performance Improvement of Address-Dominated Multimedia Applications , 2003 .

[144]  Gerda Janssens,et al.  Multi-dimensional incremental loop fusion for data locality , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[145]  Jesús A. De Loera,et al.  Short rational functions for toric algebra and applications , 2004, J. Symb. Comput..

[146]  Jesús A. De Loera,et al.  Effective lattice point counting in rational convex polytopes , 2004, J. Symb. Comput..

[147]  Matthias Beck The Partial-Fractions Method for Counting Solutions to Integral Linear Systems , 2004, Discret. Comput. Geom..

[148]  Siddhartha Chatterjee,et al.  An Automata-Theoretic Algorithm for Counting Solutions to Presburger Formulas , 2004, CC.

[149]  Sergio Yovine,et al.  On synthesizing parametric specifications of dynamic memory utilization , 2004 .

[150]  Louis Latour,et al.  From automata to formulas: convex integer polyhedra , 2004, Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004..

[151]  Kristof Beyls,et al.  Software Methods to Improve Data Locality and Cache Behavior , 2004 .

[152]  Tanja Van Achteren Data reuse exploration techniques for multimedia applications , 2004 .

[153]  Memory requirement optimization with loop fusion and loop shifting , 2004, Euromicro Symposium on Digital System Design, 2004. DSD 2004..

[154]  Vincent Loechner,et al.  Analytical computation of Ehrhart polynomials and its applications for embedded systems , 2004 .

[155]  P.G. Kjeldsberg,et al.  Memory hierarchy usage estimation for global loop transformations , 2004, Proceedings Norchip Conference, 2004..

[156]  Sven Verdoolaege,et al.  Analytical Computation of Ehrhart Polynomials and its Application in Compile-Time Generated Cache Hints , 2004 .

[157]  Erik Brockmeyer,et al.  Power, Performance and Area Exploration for Data Memory Assignment of Multimedia Applications , 2004, SAMOS.

[158]  Bernard Boigelot,et al.  Counting the solutions of Presburger equations without enumerating them , 2001, Theor. Comput. Sci..

[159]  Vincent Loechner,et al.  Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations , 2004, CASES '04.

[160]  B. Sturmfels,et al.  Combinatorial Commutative Algebra , 2004 .

[161]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[162]  Ruriko Yoshida,et al.  Barvinok's rational functions: algorithms and applications to optimization, statistics, and algebra , 2004 .

[163]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 2004, International Journal of Parallel Programming.

[164]  Steven Fortune,et al.  Voronoi Diagrams and Delaunay Triangulations , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[165]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[166]  Alan Cobham,et al.  On the base-dependence of sets of numbers recognizable by finite automata , 1969, Mathematical systems theory.

[167]  Kristof Beyls,et al.  On the calculation of Ehrhart polynomials in degenerate domains , 2005 .

[168]  Maurice Bruynooghe,et al.  Computation and manipulation of enumerators of integer projections of parametric polytopes , 2005 .

[169]  Maurice Bruynooghe,et al.  Experiences with Enumeration of Integer Projections of Parametric Polytopes , 2005, CC.

[170]  Benoı̂t Meister,et al.  Projecting Periodic Polyhedra for Loop Nest Analysis , 2007 .

[171]  F. Catthoor,et al.  Verification of loop transformations for complex data dominated applications , .