A comparison of full and partial predicated execution support for ILP processors

One can effectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential benefits of predicated execution are high, the tradeoffs involved in the design of an instruction set to support predicated execution can be difficult. On one end of the design spectrum, architectural support for full predicated execution requires increasing the number of source operands for all instructions. Full predicate support provides for the most flexibility and the largest potential performance improvements. On the other end, partial predicated execution support, such as conditional moves, requires very little change to existing architectures. This paper presents a preliminary study to qualitatively and quantitatively address the benefit of full and partial predicated execution support. With our current compiler technology, we show that the compiler can use both partial and full predication to achieve speedup in large control-intensive programs. Some details of the code generation techniques are shown to provide insight into the benefit of going from partial to full predication. Preliminary experimental results are very encouraging: partial predication provides an average of 33% performance improvement for an 8-issue processor with no predicate support while full predication provides an additional 30% improvement.

[1]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[2]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[3]  Peter Y.-T. Hsu,et al.  Highly concurrent scalar processing , 1986, ISCA '86.

[4]  Michael D. Smith,et al.  Limits on multiple instruction issue , 1989, ASPLOS III.

[5]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[6]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[7]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[8]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[9]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[10]  Steven O. Hobbs,et al.  The GEM Optimizing Compiler System , 1992, Digit. Tech. J..

[11]  Yale N. Patt,et al.  A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[12]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Vinod Kathail,et al.  Height reduction of control recurrences for ILP processors , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  G. Sohi,et al.  Guarded execution and branch prediction in dynamic ILP processors , 1994, Proceedings of 21 International Symposium on Computer Architecture.