Reduced code size modulo scheduling in the absence of hardware support

Modulo scheduling is a very effective instruction scheduling technique that exploits Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive iterations. Unfortunately, modulo scheduling has been shown to cause heavy code expansion. To avoid the penalties of code expansion, some processors have dedicated hardware support for modulo scheduled loops. However, this dedicated hardware support has a cost in chip area, cycle time, processor complexity, and compiler complexity.This paper shows that the right combination of scheduling heuristics combined with speculative modulo scheduling can significantly reduce code expansion. In addition, several code generation schema heuristics are proposed to further reduce code expansion. The evaluations show that loops can be effectively modulo scheduled with an average code expansion only 1.5 times the original loop size. Compared with a state of the art modulo scheduler, our code size sensitive heuristics reduce the size of embedded domain benchmarks binaries by 30% on average. While performance is mostly unchanged, some applications show speed-ups up to 20% due to a reduction in instruction cache capacity misses.

[1]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[2]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[3]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.

[4]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[5]  Josep Llosa,et al.  A comparative study of modulo scheduling techniques , 2002, ICS '02.

[6]  Geoffrey Brown,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, ISCA '00.

[7]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[8]  Paolo Faraboschi,et al.  Custom-fit processors: letting applications define architectures , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[9]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a modulo schedule , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[10]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.

[11]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 25.

[12]  Wen-mei W. Hwu,et al.  Modulo schedule buffers , 2001, MICRO.

[13]  Vicki H. Allan,et al.  Software pipelining , 1995, CSUR.

[14]  B. Ramakrishna Rau,et al.  Code generation schema for modulo scheduled loops , 1992, MICRO.

[15]  Junqiang Sun,et al.  Tms320c6000 cpu and instruction set reference guide , 2000 .

[16]  Wen-mei W. Hwu,et al.  Modulo scheduling of loops in control-intensive non-numeric programs , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.