Loop Shifting for Loop Compaction

The idea of decomposed software pipelining is to decouple the software pipelining problem into a cyclic scheduling problem without resource constraints and an acyclic scheduling problem with resource constraints. In terms of loop transformation and code motion, the technique can be formulated as a combination of loop shifting and loop compaction. Loop shifting amounts to moving statements between iterations thereby changing some loop independent dependences into loop carried dependences and vice versa. Then, loop compaction schedules the body of the loop considering only loop independent dependences, but taking into account the details of the target architecture. In this paper, we show how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction. The first problem is well-known and can be solved by an algorithm due to Leiserson and Saxe. We show that the second optimization (and the combination with the first one) is also polynomially solvable with a fast graph algorithm, variant of minimum-cost flow algorithms. Finally, we analyze the improvements obtained on loop compaction by experiments on random graphs.

[1]  Uwe Schwiegelshohn,et al.  Generating Close to Optimum Loop Schedules on Parallel Processors , 1994, Parallel Process. Lett..

[2]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[3]  Michel Minoux,et al.  Graphs and Algorithms , 1984 .

[4]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[5]  François Charot,et al.  SALTO : System for Assembly-Language Transformation and Optimization , 1996 .

[6]  Suneel Jain,et al.  Circular scheduling: a new technique to perform software pipelining , 1991, PLDI '91.

[7]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[8]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[9]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[10]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[11]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[12]  Alexandre E. Eichenberger,et al.  Minimum register requirements for a modulo schedule , 1994, MICRO 27.

[13]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[14]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[15]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[16]  G.S. Sohi,et al.  Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[17]  Alain Darte,et al.  the NESTOR Library: A Tool for Implementing FORTRAN Source Transformations , 1999, HPCN Europe.

[18]  Yves Robert,et al.  Circuit Retiming Applied to Decomposed Software Pipelining , 1998, IEEE Trans. Parallel Distributed Syst..

[19]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[20]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[21]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[22]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[23]  Uwe Schwiegelshohn,et al.  On Optimal Parallelization of Arbitrary Loops , 1991, J. Parallel Distributed Comput..

[24]  Carole Dulong,et al.  The IA-64 Architecture at Work , 1998, Computer.

[25]  Alain Darte,et al.  Loop Shifting for Loop Compaction , 1999, LCPC.