Control CPR: a branch height reduction optimization for EPIC architectures

The challenge of exploiting high degrees of instruction-level parallelism is often hampered by frequent branching. Both exposed branch latency and low branch throughput can restrict parallelism. Control critical path reduction (control CPR) is a compilation technique to address these problems. Control CPR can reduce the dependence height of critical paths through branch operations as well as decrease the number of executed branches. In this paper, we present an approach to control CPR that recognizes sequences of branches using profiling statistics. The control CPR transformation is applied to the predominant path through this sequence. Our approach, its implementation, and experimental results are presented. This work demonstrates that control CPR enhances instruction-level parallelism for a variety of application programs and improves their performance across a range of processors.

[1]  David B. Whalley,et al.  Improving performance by branch reordering , 1998, PLDI '98.

[2]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[3]  Vinod Kathail,et al.  Critical path reduction for scalar programs , 1995, MICRO 1995.

[4]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[5]  Richard Johnson,et al.  Analysis techniques for predicated code , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[6]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[7]  Richard Kenner,et al.  Eliminating branches using a superoptimizer and the GNU C compiler , 1992, PLDI '92.

[8]  David B. Whalley,et al.  Avoiding unconditional jumps by code replication , 1992, PLDI '92.

[9]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[10]  Kemal Ebcioglu,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 1992.

[11]  Thomas M. Conte,et al.  Treegion scheduling for wide issue processors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[12]  Rajiv Gupta,et al.  Interprocedural conditional branch elimination , 1997, PLDI '97.

[13]  Soo-Mook Moon,et al.  An efficient resource-constrained global scheduling technique for superscalar and VLIW processors , 1992, MICRO 25.

[14]  GranlundTorbjörn,et al.  Eliminating branches using a superoptimizer and the GNU C compiler , 1992 .

[15]  David B. Whalley,et al.  Avoiding conditional branches by code replication , 1995, PLDI '95.

[16]  Vinod Kathail,et al.  Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism , 1993, LCPC.

[17]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[18]  V. Kathail,et al.  Acceleration of Algebraic Recurrences on Processors with Instruction Level Parallelism , 1993 .