论文信息 - GPMB—software pipelining branch-intensive loops

GPMB—software pipelining branch-intensive loops

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle. >

[1] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2] Robert B. Murray,et al. Compiling for the CRISP Microprocessor , 1987, COMPCON.

[3] Joseph A. Fisher,et al. Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[4] Scott A. Mahlke,et al. Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[5] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[6] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[7] Andrew Wolfe,et al. A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[8] Monica Sin-Ling Lam,et al. A Systolic Array Optimizing Compiler , 1989 .

[9] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[10] John R. Ellis,et al. Bulldog: A Compiler for VLIW Architectures , 1986 .

[11] Jian Wang,et al. A software pipelining based VLIW architecture and optimizing compiler , 1990, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture.

[12] Alexander Aiken,et al. Optimal loop parallelization , 1988, PLDI '88.

[13] Scott A. Mahlke,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[14] John R. Ellis,et al. Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific) , 1985 .

[15] Gerrit A. Slavenburg,et al. CREATE-LIFE: a modular design approach for high performance ASICs , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[16] Kemal Ebcioglu,et al. A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[17] Jian Wang,et al. URPR-1: A single-chip VLIW architecture , 1993, Microprocess. Microprogramming.

[18] Vicki H. Allan,et al. Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[19] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.

[20] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[21] Bogong Su,et al. A VLIW architecture for optimal execution of branch-intensive loops , 1992, MICRO 1992.

[22] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[23] Jian Wang,et al. GURPR—a method for global software pipelining , 1987, MICRO 20.

[24] Wen-mei W. Hwu,et al. Trace selection for compiling large C application programs to microcode , 1988, MICRO 1988.

[25] Edward S. Davidson,et al. Highly concurrent scalar processing , 1986, ISCA 1986.

[26] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.

[27] Augustus K. Uht. Requirements for Optimal Execution of Loops with Tests , 1992, IEEE Trans. Parallel Distributed Syst..

[28] J A Fisher,et al. Instruction-Level Parallel Processing , 1991, Science.

[29] Bogong Su,et al. URPR—An extension of URCR for software pipelining , 1986, MICRO 19.