Modulo schedule buffers

As VLIW/EPIC processors are increasingly used in real-time, signal-processing, and embedded applications, the importance of minimizing code size and reducing power is growing. This paper describes a new architectural mechanism, called the Modulo Schedule Buffers, that provides an elegant interface for the execution of modulo scheduled loops. While the performance is similar to that of kernel-only modulo scheduling, this mechanism has a number of advantages, including minimal code expansion. Rather than generating fully-scheduled kernels, the compiler generates a sequential form of the modulo scheduled loop body. Using the sequential form, the hardware internally synthesizes the prologue, kernel, and epilogue. In addition, while loops can be scheduled with fewer constraints and fewer explicit prologues/epilogues than with existing mechanisms. Because the hardware controls loop execution, the burden of modulo schedule loop control is lifted from the predicate register file, allowing for a less rigorous predication implementation. Finally; hardware control limits the interrupt latency when using the EQ explicit latency model to the execution latency of one iteration, rather than the whole loop invocation.

[1]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[2]  David I. August,et al.  Compiler technology for future microprocessors , 1995, Proc. IEEE.

[3]  Junqiang Sun,et al.  Tms320c6000 cpu and instruction set reference guide , 2000 .

[4]  B. Ramakrishna Rau,et al.  Code generation schema for modulo scheduled loops , 1992, MICRO.

[5]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[6]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[7]  David B. Whalley,et al.  Effective exploitation of a zero overhead loop buffer , 1999, LCTES '99.

[8]  Alexandru Nicolau,et al.  Advances in languages and compilers for parallel processing , 1991 .

[9]  Kevin W. Rudd,et al.  Efficient Exception Handling Techniques for High-Performance Processor Architectures , 1997 .

[10]  Scott A. Mahlke,et al.  Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[11]  M. Rajagopalan,et al.  Efficient scheduling of fine grain parallelism in loops , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[12]  Daniel M. Lavery,et al.  Modulo Scheduling for Control-Intensive General-Purpose Programs , 1997 .

[13]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.