Optimal software pipelining of nested loops

The article presents an approach to software pipelining of nested loops. While several articles have addressed software pipelining of single (non-nested) loops, little work has been done in the area of applying it to nested loops. The article solves the problem of finding the minimum iteration initiation interval (in the absence of resource constraints) for each level of a nested loop. The problem is formulated as one of finding a rational quasi-affine schedule for each statement in the body of a perfectly nested loop which is then solved using linear programming. This allows us to treat iteration-dependent statement reordering and multidimensional loop unrolling in the same framework. Unlike most work in scheduling nested loops, we treat each statement in the body as a unit of scheduling. Thus, the schedules derived allow for instances of statements from different iterations to be scheduled at the same time. Optimal schedules derived here subsume extant work on software pipelining of non-nested loops, in the absence of resource constraints.<<ETX>>

[1]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[2]  Kemal Ebcioglu,et al.  A global resource-constrained parallelization technique , 1989 .

[3]  Shlomo Weiss,et al.  A study of scalar compilation techniques for pipelined supercomputers , 1987, ASPLOS 1987.

[4]  Christine Eisenbeis Optimization of horizontal microcode generation for loop structures , 1988, ICS '88.

[5]  J. Ramanujam,et al.  Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[6]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[7]  H. P. Williams THEORY OF LINEAR AND INTEGER PROGRAMMING (Wiley-Interscience Series in Discrete Mathematics and Optimization) , 1989 .

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[10]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[11]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[12]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[13]  Barbara B. Simons,et al.  Scheduling Sequential Loops on Parallel Processors , 1987, ICS.

[14]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[15]  Alex Aiken,et al.  Compaction-Based Parallelization , 1988 .

[16]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[17]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[18]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[19]  Ronald Gary Cytron Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing) , 1984 .

[20]  B. Ramakrishna Rau Cydra 5 directed dataflow architecture , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[21]  Guang R. Gao,et al.  Extending Software Pipelining Techniques for Scheduling Nested Loops , 1993, LCPC.

[22]  J. Ramanujam,et al.  Tiling Multidimensional Itertion Spaces for Multicomputers , 1992, J. Parallel Distributed Comput..

[23]  Jian Wang,et al.  GURPR—a method for global software pipelining , 1987, MICRO 20.

[24]  Guang R. Gao,et al.  A timed Petri-net model for fine-grain loop scheduling , 1991, PLDI '91.

[25]  Vicki H. Allan,et al.  Software pipelining: a comparison and improvement , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[26]  J. Ramanujam Software Pipelining of Nested Loops , 1994 .

[27]  Steven Vajda,et al.  Linear Programming. Methods and Applications , 1964 .

[28]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[29]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[30]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[31]  D. Bartholomew,et al.  Linear Programming: Methods and Applications , 1970 .

[32]  Kemal Ebcioglu,et al.  A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[33]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[34]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[35]  Kazuo Iwano,et al.  An Efficient Algorithm for Optimal Loop Parallelization , 1990, SIGAL International Symposium on Algorithms.