论文信息 - Lifetime-sensitive modulo scheduling

Lifetime-sensitive modulo scheduling

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.

Richard A. Huff | Richard A. Huff

[1] M. Schlansker,et al. On Predicated Execution , 1991 .

[2] James C. Dehnert,et al. Overlapped loop support in the Cydra 5 , 1989, ASPLOS 1989.

[3] B. Ramakrishna Rau,et al. Data Flow and Dependence Analysis for Instruction Level Parallelism , 1991, LCPC.

[4] B. R. Rau,et al. The Cydra 5 Departmental Supercomputer: design philosophies, decisions and trade-offs , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[5] Mike Schlansker,et al. Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[6] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[7] Susan J. Eggers,et al. Integrating register allocation and instruction scheduling for RISCs , 1991, ASPLOS IV.

[8] Peter Y.-T. Hsu,et al. Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[9] Monica Sin-Ling Lam,et al. A Systolic Array Optimizing Compiler , 1989 .

[10] B. Ramakrishna Rau,et al. Register allocation for software pipelined loops , 1992, PLDI '92.

[11] Krishna Subramanian,et al. Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[12] Grant E. Haab,et al. Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[13] James C. Tiernan,et al. An efficient search algorithm to find the elementary circuits of a graph , 1970, CACM.

[14] Mark N. Wegman,et al. An efficient method of computing static single assignment form , 1989, POPL '89.

[15] Vicki H. Allan,et al. Incremental foresighted local compaction , 1989, MICRO 22.

[16] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.

[17] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[18] Bruce D. Shriver,et al. Local Microcode Compaction Techniques , 1980, CSUR.