Lifetime-sensitive modulo scheduling

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

[1]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[2]  James C. Dehnert,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS 1989.

[3]  B. Ramakrishna Rau,et al.  Data Flow and Dependence Analysis for Instruction Level Parallelism , 1991, LCPC.

[4]  B. R. Rau,et al.  The Cydra 5 Departmental Supercomputer: design philosophies, decisions and trade-offs , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[5]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[6]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[7]  Susan J. Eggers,et al.  Integrating register allocation and instruction scheduling for RISCs , 1991, ASPLOS IV.

[8]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[9]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[10]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[11]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[12]  Grant E. Haab,et al.  Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[13]  James C. Tiernan,et al.  An efficient search algorithm to find the elementary circuits of a graph , 1970, CACM.

[14]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[15]  Vicki H. Allan,et al.  Incremental foresighted local compaction , 1989, MICRO 22.

[16]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[17]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[18]  Bruce D. Shriver,et al.  Local Microcode Compaction Techniques , 1980, CSUR.