Understanding loops: The influence of the decomposition of Karp, Miller, and Winograd

Loops are a fundamental control structure in programming languages. Being able to analyze, to transform, to optimize loops is a key feature for compilers to handle repetitive schemes with a complexity proportional to the program size and not to the number of operations it describes. This is true for the generation of optimized software as well as for the generation of hardware, for both sequential and parallel execution. The goal of this talk is to recall one of the most important theory to understand loops - the decomposition of Karp, Miller, and Winograd (1967) for systems of uniform recurrence equations - and its connections with two different developments on loops: the theory of transformation and parallelization of (nested) DO loops and the theory of ranking functions for proving the termination of (imperative) programs with WHILE loops. Other connections, which will not be covered, include reachability problems in vector addition systems and Petri nets.

[1]  Thomas Kailath,et al.  Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.

[2]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[3]  Sailesh K. Rao,et al.  Regular interactive algorithms and their implementations on processor arrays , 1986 .

[4]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 2004, International Journal of Parallel Programming.

[5]  John E. Hopcroft,et al.  On the Reachability Problem for 5-Dimensional Vector Addition Systems , 1976, Theor. Comput. Sci..

[6]  Henny B. Sipma,et al.  Practical Methods for Proving Program Termination , 2002, CAV.

[7]  Thomas Kailath,et al.  Derivation, extensions and parallel implementation of regular iterative algorithms , 1989 .

[8]  Frédéric Mesnard,et al.  The Automatic Synthesis of Linear Ranking Functions: The Complete Unabridged Version , 2010, ArXiv.

[9]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[10]  Henny B. Sipma,et al.  Synthesis of Linear Ranking Functions , 2001, TACAS.

[11]  Frédéric Vivien On the optimality of Feautrier's scheduling algorithm , 2003, Concurr. Comput. Pract. Exp..

[12]  Paul Feautrier,et al.  Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs , 2010, SAS.

[13]  Sanjay V. Rajopadhye,et al.  On Synthesizing Systolic Arrays from Recurrence Equations with Linear Dependencies , 1986, FSTTCS.

[14]  Frédéric Vivien,et al.  On the Optimality of Allen and Kennedy's Algorithm for Parallelism Extraction in Nested Loops , 1996, Parallel Algorithms Appl..

[15]  Alain Darte,et al.  Complexity of Multi-dimensional Loop Alignment , 2002, STACS.

[16]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[17]  Patrice Quinton Automatic synthesis of systolic arrays from uniform recurrent equations , 1984, ISCA '84.

[18]  Henny B. Sipma,et al.  Linear Ranking with Reachability , 2005, CAV.

[19]  Paul Feautrier,et al.  Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..

[20]  Richard M. Karp,et al.  The Organization of Computations for Uniform Recurrence Equations , 1967, JACM.

[21]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[22]  Dan I. Moldovan,et al.  On the Analysis and Synthesis of VLSI Algorithms , 1982, IEEE Transactions on Computers.

[23]  A. Darte Mathematical Tools for Loop Transformations: From Systems of Uniform Recurrence Equations to the Polytope Model , 1999 .

[24]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[25]  P. Feautrier Parametric integer programming , 1988 .

[26]  Sumit Gulwani,et al.  SPEED: precise and efficient static estimation of program computational complexity , 2009, POPL '09.

[27]  Paul Feautrier,et al.  Program Termination and Worst Time Complexity with Multi-Dimensional Affine Ranking Functions , 2009 .

[28]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[29]  Patrice Quinton,et al.  Scheduling affine parameterized recurrences by means of variable dependent timing functions , 1990 .

[30]  H. T. Kung Why systolic architectures? , 1982, Computer.

[31]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[32]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[33]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[34]  Allen Van Gelder,et al.  Termination detection in logic programs using argument sizes (extended abstract) , 1991, PODS.

[35]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[36]  Ilse C. F. Ipsen,et al.  Systolic array synthesis: computability and time cones , 1986 .

[37]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[38]  Frédéric Vivien,et al.  Revisiting the decomposition of Karp, Miller and Winograd , 1995, Proceedings The International Conference on Application Specific Array Processors.

[39]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..