论文信息 - Predicated static single assignment

Predicated static single assignment

Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural mechanism that increases instruction level parallelism by removing branches and allowing simultaneous execution of multiple paths of control, only committing instructions from the correct path. In order for the compiler to expose such parallelism, traditional compiler data-flow analysis needs to be extended to predicated code. In this paper we present predicated static single assignment (PSSA) to enable aggressive predicated optimization and instruction scheduling. PSSA removes false dependences by exploiting renaming and information about the multiple control paths. We demonstrate the usefulness of PSSA for predicated speculation and control height reduction. These two predicated code optimizations used during instruction scheduling reduce the dependence length of the critical paths through a predicated region. Our results show that using PSSA to enable speculation and control height reduction reduces execution time from 10% to 58%.

Larry Carter | Brad Calder | Beth Simon | Jeanne Ferrante | Lori Carter

[1] Scott A. Mahlke,et al. Reverse If-Conversion , 1993, PLDI '93.

[2] James R. Larus,et al. Improving data-flow analysis with path profiles , 1998, PLDI.

[3] M. Schlansker,et al. On Predicated Execution , 1991 .

[4] Rajiv Gupta,et al. Path profile guided partial dead code elimination using predication , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[5] Scott A. Mahlke,et al. Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[6] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7] Vinod Kathail,et al. Critical path reduction for scalar programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[8] Scott A. Mahlke,et al. Control CPR: a branch height reduction optimization for EPIC architectures , 1999, PLDI '99.

[9] Gary S. Tyson,et al. The effects of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[10] Vinod Kathail,et al. Height reduction of control recurrences for ILP processors , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[11] Todd M. Austin,et al. Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[12] Scott A. Mahlke,et al. Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[13] Roy Dz-Ching Ju,et al. Global predicate analysis and its application to register allocation , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[14] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[15] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[16] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.

[17] Scott A. Mahlke,et al. A framework for balancing control flow and predication , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18] Richard Johnson,et al. Analysis techniques for predicated code , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19] Mark N. Wegman,et al. An efficient method of computing static single assignment form , 1989, POPL '89.

[20] Soo-Mook Moon,et al. Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[21] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .