Reducing branch misprediction penalty via selective branch recovery

Branch misprediction penalty consists of two components: the time wasted on misspeculative execution until the mispredicted branch is resolved and the time to restart the pipeline with useful instructions once the branch is resolved. Current processor trends, large instruction windows and deep pipelines, amplify both components of the branch misprediction penalty. We propose a novel method, called selective branch recovery (SBR), to reduce both components of branch misprediction penalty. SBR exploits a frequently occurring type of control independence - exact convergence - where the mispredicted path converges exactly at the beginning of the correct path. In such cases, SBR selectively reuses the results computed during misspeculative execution and obviates the need to fetch or rename convergent instructions again. Thus, SBR addresses both components of branch misprediction penalty. To increase the likelihood of branch mispredictions that can be handled with SBR, we also present an effective means for inducing exact convergence on misspeculative paths. With SBR, we significantly improve performance (between 3%-22%, average 8%) on a wide range of benchmarks over our baseline processor that does not exploit SBR.

[1]  Haitham Akkary,et al.  Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors , 2003, MICRO.

[2]  Haitham Akkary,et al.  Recycling waste: exploiting wrong-path execution to improve branch prediction , 2003, ICS '03.

[3]  Kevin Skadron,et al.  A Scheme for Selective Squash and Re-issue for Single-Sided Branch Hammocks , 2001 .

[4]  Quinn Jacobson,et al.  A study of control independence in superscalar processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[5]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Eric Rotenberg,et al.  A large, fast instruction window for tolerating cache misses , 2002, ISCA.

[7]  Scott A. Mahlke,et al.  Integrated predicated and speculative execution in the IMPACT EPIC architecture , 1998, ISCA.

[8]  Haitham Akkary,et al.  Perceptron-Based Branch Confidence Estimation , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[9]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[10]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[11]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  Eric Rotenberg,et al.  Control independence in trace processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[13]  Gurindar S. Sohi,et al.  Register integration: a simple and efficient implementation of squash reuse , 2000, MICRO 33.

[14]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  Weihaw Chuang,et al.  The Intel IA-64 Compiler Code Generator , 2000, IEEE Micro.

[16]  John Paul Shen,et al.  Reducing branch misprediction penalties via dynamic control independence detection , 1999, ICS '99.

[17]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[18]  Enric Morancho,et al.  Recovery mechanism for latency misprediction , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[19]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[20]  Chen-Yong Cher,et al.  Skipper: a microarchitecture for exploiting control-flow independence , 2001, MICRO.