论文信息 - Superblock formation using static program analysis

Superblock formation using static program analysis

To achieve higher instruction-level parallelism, the constraint imposed by a single control flow must be relaxed. Control operations should execute in parallel just like data operations. We present a new software pipelining method called GPMB (Global Pipelining with Multiple Branches) which is based on architectures supporting multi-way branching and multiple control flows. Preliminary experimental results show that, for IFless loops, GPMB performs as well as modulo scheduling, and for branch-intensive loops, GPMB performs much better than software pipelining assuming the constraint of one two-way branch per cycle.<<ETX>>

[1] Gerrit A. Slavenburg,et al. CREATE-LIFE: a modular design approach for high performance ASICs , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[2] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[3] Kemal Ebcioglu,et al. A compilation technique for software pipelining of loops with conditional jumps , 1987, MICRO 20.

[4] Jian Wang,et al. URPR-1: A single-chip VLIW architecture , 1993, Microprocess. Microprogramming.

[5] Vicki H. Allan,et al. Software pipelining: an evaluation of enhanced pipelining , 1991, MICRO 24.

[6] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7] Wen-mei W. Hwu,et al. Trace selection for compiling large C application programs to microcode , 1988, MICRO 1988.

[8] Andrew Wolfe,et al. A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[9] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.

[10] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[11] Bogong Su,et al. A VLIW architecture for optimal execution of branch-intensive loops , 1992, MICRO 1992.

[12] Alexander Aiken,et al. Optimal loop parallelization , 1988, PLDI '88.

[13] J A Fisher,et al. Instruction-Level Parallel Processing , 1991, Science.

[14] Bogong Su,et al. URPR—An extension of URCR for software pipelining , 1986, MICRO 19.

[15] Joseph A. Fisher,et al. Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[16] Scott A. Mahlke,et al. IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[17] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[18] Robert B. Murray,et al. Compiling for the CRISP Microprocessor , 1987, COMPCON.

[19] Augustus K. Uht. Requirements for Optimal Execution of Loops with Tests , 1992, IEEE Trans. Parallel Distributed Syst..

[20] Scott A. Mahlke,et al. Compiler code transformations for superscalar-based high-performance systems , 1992, Proceedings Supercomputing '92.

[21] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[22] Jinshi Xia,et al. GURPR - a method for Global Software pipelining , 1988, SIGM.

[23] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.

[24] Edward S. Davidson,et al. Highly concurrent scalar processing , 1986, ISCA 1986.

[25] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..