Superblock formation using static program analysis

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle.<<ETX>>

[1]  Gerrit A. Slavenburg,et al.  CREATE-LIFE: a modular design approach for high performance ASICs , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[2]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[3]  Kemal Ebcioglu,et al.  A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[4]  Jian Wang,et al.  URPR-1: A single-chip VLIW architecture , 1993, Microprocess. Microprogramming.

[5]  Vicki H. Allan,et al.  Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  Wen-mei W. Hwu,et al.  Trace selection for compiling large C application programs to microcode , 1988, MICRO 1988.

[8]  Andrew Wolfe,et al.  A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[9]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[10]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[11]  Bogong Su,et al.  A VLIW architecture for optimal execution of branch-intensive loops , 1992, MICRO 1992.

[12]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[13]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[14]  Bogong Su,et al.  URPR—An extension of URCR for software pipelining , 1986, MICRO 19.

[15]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[16]  Scott A. Mahlke,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[17]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[18]  Robert B. Murray,et al.  Compiling for the CRISP Microprocessor , 1987, COMPCON.

[19]  Augustus K. Uht Requirements for Optimal Execution of Loops with Tests , 1992, IEEE Trans. Parallel Distributed Syst..

[20]  Scott A. Mahlke,et al.  Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[21]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[22]  Jinshi Xia,et al.  GURPR - a method for Global Software pipelining , 1988, SIGM.

[23]  James R. Larus,et al.  Branch prediction for free , 1993, PLDI '93.

[24]  Edward S. Davidson,et al.  Highly concurrent scalar processing , 1986, ISCA 1986.

[25]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..