Resource-Constrained Software Pipelining

This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented.

[1]  Alexandru Nicolau,et al.  Register Allocation, Renaming and Their Impact on Fine-Grain Parallelism , 1991, LCPC.

[2]  Joseph A. Fisher,et al.  2n-way jump microinstruction hardware and an effective instruction binding method , 1980, SIGM.

[3]  Alex Aiken,et al.  Compaction-Based Parallelization , 1988 .

[4]  Jian Wang,et al.  GURPR—a method for global software pipelining , 1987, MICRO 20.

[5]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[6]  H. T. Kung,et al.  The Warp Computer: Architecture, Implementation, and Performance , 1987, IEEE Transactions on Computers.

[7]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[8]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[9]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[10]  Alexandru Nicolau,et al.  Efficient hardware for multiway jumps and pre-fetches , 1985, MICRO 18.

[11]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[12]  Alexandru Nicolau,et al.  Uniform Parallelism Exploitation in Ordinary Programs , 1985, ICPP.

[13]  Jean-Loup Baer,et al.  Computer systems architecture , 1980 .

[14]  Kemal Ebcioglu,et al.  A global resource-constrained parallelization technique , 1989 .

[15]  Keshav Pingali,et al.  Dependence flow graphs: an algebraic approach to program dependencies , 1991, POPL '91.

[16]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[17]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[18]  Peter M. Kogge The microprogramming of pipelined processors , 1977, ISCA '77.

[19]  Uwe Schwiegelshohn,et al.  On Optimal Parallelization of Arbitrary Loops , 1991, J. Parallel Distributed Comput..

[20]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[21]  Vicki H. Allan,et al.  Advanced software pipelining and the program dependence graph , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[22]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[23]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[24]  Toshio Nakatani,et al.  Using a lookahead window in a compaction-based parallelizing compiler , 1991, SIGM.

[25]  Toshio Nakatani,et al.  “Combining” as a compilation technique for VLIW architectures , 1989, MICRO 22.

[26]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[27]  J. Janardhan,et al.  Enhanced region scheduling on a program dependence graph , 1992, MICRO 25.

[28]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[29]  Alexander AIKEN,et al.  A Theory of Compaction-Based Parallelization , 1990, Theor. Comput. Sci..

[30]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[31]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[32]  Scott A. Mahlke,et al.  Reverse If-Conversion , 1993, PLDI '93.

[33]  Guang R. Gao,et al.  A timed Petri-net model for fine-grain loop scheduling , 1991, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[34]  Vicki H. Allan,et al.  Enhanced region scheduling on a program dependence graph , 1992, MICRO 1992.

[35]  Christos A. Papachristou,et al.  A VLIW architecture based on shifting register files , 1993, MICRO 1993.

[36]  Bogong Su,et al.  URPR—An extension of URCR for software pipelining , 1986, MICRO 19.

[37]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[38]  Vicki H. Allan,et al.  Efficient scheduling of fine grain parallelism in loops , 1993, MICRO 1993.

[39]  Kemal Ebcioglu,et al.  A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[40]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 25.