Dynamic hammock predication for non-predicated instruction set architectures

Conventional speculative architectures use branch prediction to evaluate the most likely execution path during program execution. However certain branches are difficult to predict. One solution to this problem is to evaluate both paths following such a conditional branch. Predicated execution can be used to implement this form of multi-path execution. Predicated architectures fetch and issue instructions that have associated predicates. These predicates indicate if the instruction should commit its result. Predicating a branch reduces the number of branches executed, eliminating the chance of branch misprediction at the cost of executing additional instructions. In this paper, we propose a restricted form of multi-path execution called Dynamic Predication for architectures with little or no support for predicated instructions in their instruction set. Dynamic predication dynamically predicates instruction sequences in the form of a branch hammock concurrently executing both paths of the branch. A branch hammock is a short forward branch that spans a few instructions in the form of an if-then or if-then-else construct we mark these and other constructs in the executable. When the decode stage detects such a sequence, it passes a predicated instruction sequence to a dynamically scheduled execution core. Our results show that dynamic predication can accrue speedups of up to 13%.

[1]  Gary S. Tyson,et al.  The effects of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[3]  Bantwal R. Rau Dynamically scheduled VLIW processors , 1993, MICRO 1993.

[4]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[5]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[7]  Dirk Grunwald,et al.  Confidence estimation for speculation control , 1998, ISCA.

[8]  Gerry Kane,et al.  PA-RISC 2.0 Architecture , 1995 .

[9]  Gary S. Tyson,et al.  Limited Dual Path Execution , 2000 .

[10]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.

[11]  R. Khanna,et al.  Circuit techniques in a 266-MHz MMX-enabled processor , 1997 .

[12]  Scott A. Mahlke,et al.  A comparison of full and partial predicated execution support for ILP processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[13]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[15]  FerranteJeanne,et al.  The program dependence graph and its use in optimization , 1987 .

[16]  B. R. Rau,et al.  The Cydra 5 Departmental Supercomputer: design philosophies, decisions and trade-offs , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[17]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[19]  Paolo Faraboschi,et al.  An analysis of dynamic scheduling techniques for symbolic applications , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[20]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.

[21]  Yale N. Patt,et al.  Facilitating superscalar processing via a combined static/dynamic register renaming scheme , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[22]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[23]  Scott A. Mahlke,et al.  Characterizing the impact of predicated execution on branch prediction , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[24]  Norman P. Jouppi,et al.  Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[25]  Roger A. Bringmann,et al.  Effective Compiler Support For Predicated Execution Using The Hyperblock , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.