GPMB—software pipelining branch-intensive loops

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle. >

[1]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2]  Robert B. Murray,et al.  Compiling for the CRISP Microprocessor , 1987, COMPCON.

[3]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[4]  Scott A. Mahlke,et al.  Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[5]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[6]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[7]  Andrew Wolfe,et al.  A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[8]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[9]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[10]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[11]  Jian Wang,et al.  A software pipelining based VLIW architecture and optimizing compiler , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[12]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[13]  Scott A. Mahlke,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[14]  John R. Ellis,et al.  Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .

[15]  Gerrit A. Slavenburg,et al.  CREATE-LIFE: a modular design approach for high performance ASICs , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[16]  Kemal Ebcioglu,et al.  A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[17]  Jian Wang,et al.  URPR-1: A single-chip VLIW architecture , 1993, Microprocess. Microprogramming.

[18]  Vicki H. Allan,et al.  Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[19]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[20]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[21]  Bogong Su,et al.  A VLIW architecture for optimal execution of branch-intensive loops , 1992, MICRO 1992.

[22]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[23]  Jian Wang,et al.  GURPR—a method for global software pipelining , 1987, MICRO 20.

[24]  Wen-mei W. Hwu,et al.  Trace selection for compiling large C application programs to microcode , 1988, MICRO 1988.

[25]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[26]  James R. Larus,et al.  Branch prediction for free , 1993, PLDI '93.

[27]  Augustus K. Uht Requirements for Optimal Execution of Loops with Tests , 1992, IEEE Trans. Parallel Distributed Syst..

[28]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[29]  Bogong Su,et al.  URPR—An extension of URCR for software pipelining , 1986, MICRO 19.