Stage scheduling: a technique to reduce the register requirements of a module schedule

Modulo scheduling is an efficient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stage-scheduling heuristics that reduce the register requirements of a given modulo schedule by shifting operations by multiples of II cycles. Measurements on a benchmark suite of 1289 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels shows that our best heuristic achieves on overage 99% of the decrease in register requirements obtained by an optimal stage scheduler.

[1]  Alexandre E. Eichenberger,et al.  Minimum register requirements for a modulo schedule , 1994, MICRO 27.

[2]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[3]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[4]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[5]  B. Ramakrishna Rau,et al.  Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.

[6]  Benoît Dupont de Dinechin An Introduction to Simplex Scheduling , 1994, IFIP PACT.

[7]  Vinod Kathail,et al.  Height reduction of control recurrences for ILP processors , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Jian Wang,et al.  Decomposed software pipelining with reduced register requirement , 1995, PACT.

[9]  Guang R. Gao,et al.  A novel framework of register allocation for software pipelining , 1993, POPL '93.

[10]  Paolo Faraboschi,et al.  An analysis of dynamic scheduling techniques for symbolic applications , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[11]  Alexandre E. Eichenberger,et al.  Optimum modulo schedules for minimum register requirements , 1995 .

[12]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[13]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[14]  S. ShouHan Wang,et al.  Ideograph/ideogram: framework/hardware for eager evaluation , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[15]  Augustus K. Uht,et al.  Concurrency Extraction via Hardware Methods Executing the Static Instruction Stream , 1992, IEEE Trans. Computers.

[16]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[17]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[18]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[19]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[20]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[21]  Edward S. Davidson,et al.  Register requirements of pipelined processors , 1992, ICS '92.

[22]  Augustus K. Uht,et al.  A Theory of Reduced and Minimal Procedural Dependencies , 1991, IEEE Trans. Computers.

[23]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[24]  Guang R. Gao,et al.  Minimizing register requirements under resource-constrained rate-optimal software pipelining , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[25]  Augustus K. Uht,et al.  Data path issues in a highly concurrent machine (abstract) , 1992, ISCA '92.

[26]  Augustus K. Uht,et al.  Extraction of massive instruction level parallelism , 1993, CARN.