Enhanced modulo scheduling for loops with conditional branches

Loops with conditional branches have multiple execution paths which are difficult to software pipeline. The modulo scheduling technique for software pipelining addresses this problem by converting loops with conditional branches into straight-line code before scheduling. In this paper we present an Enhanced Modulo Scheduling (EMS) technique that can achieve a lower minimum Initiation Interval than modulo scheduling techniques that rely on either Hierarchical Reduction or If-conversion with Predicated Execution. These three modulo scheduling techniques have been implemented in a prototype compiler. We show that for existing architectures which support one bmnch per cycle, EMS performs approximately 18% better than Hierarchical Redwction. We also show that If-conversion with Predicated Execution outperforms EMS assuming one branch per cycle. However, with hardware support for multiple branches per cycle, EMS should perform as well as or better than If-conversion with Predicated Execution.

[1]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[2]  Janak H. Patel,et al.  Improving the Throughput of a Pipeline by Insertion of Delays , 1976, ISCA.

[3]  Faye A. Briggs,et al.  The floating point performance of a superscalar SPARC processor , 1991, ASPLOS IV.

[4]  Joel Emer Proceedings of the third international conference on Architectural support for programming languages and operating systems , 1989, ASPLOS 1989.

[5]  B. Ramakrishna Rau,et al.  Register allocation for software pipelined loops , 1992, PLDI '92.

[6]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[9]  Jian Wang,et al.  GURPR*: a new global software pipelining algorithm , 1991, MICRO 24.

[10]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[11]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[12]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[13]  David C. Lin Compiler Support For Predicated Execution In Superscalar Processors , 1992 .

[14]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[15]  R. A. Towle,et al.  Control and data dependence for program transformations. , 1976 .

[16]  Christine Eisenbeis Optimization of horizontal microcode generation for loop structures , 1988, ICS '88.

[17]  Toshio Nakatani,et al.  A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture , 1990 .

[18]  Scott A. Mahlke,et al.  Using Profile Information to Assist Advaced Compiler Optimization and Scheduling , 1992, LCPC.

[19]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[20]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[21]  Wen-mei Hwu,et al.  AN IMPLEMENTATION OF GURPR*: A SOFTWARE PIPELINING ALGORITHM BY JOHN WILLIAM BOCKHAUS , 1992 .

[22]  Vicki H. Allan,et al.  Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[23]  Wen-mei W. Hwu,et al.  The benefit of predicated execution for software pipelining , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[24]  Peter Y.-T. Hsu,et al.  Overlapped loop support in the Cydra 5 , 1989, ASPLOS III.