Delay-Sensitive Branch Predictors for Future Technologies DRAFT

Accurate branch prediction is an essential component of a modern, deeply pipelined microprocessors. Because the branch predictor is on the critical path for fetching instructions, it must deliver a prediction in a single cycle. However, as feature sizes shrink and clock rates increase, access delay will significantly decrease the size and accuracy of branch predictors that can be accessed in a single cycle. Thus, there is a tradeoff between branch prediction accuracy and latency. Deeper pipelines improve overall performance by allowing more aggressive clock rates, but some performance is lost due to increased branch misprediction penalties. Ironically, with shorter clock periods, the branch predictor has less time to make a prediction and might have to be scaled back to make it faster, which decreases accuracy and reduces the advantage of higher clock rates. We propose several methods for breaking the tradeoff between accuracy and latency in branch predictors. Our methods fall into two broad categories: hierarchical predictors using purely hardware implementations, and cooperative predictors that off-load some prediction work to the compiler. We describe hierarchical organizations that extend traditional predictors. We then describe a highly accurate branch predictor based on a neural learning technique. Using a hierarchical organization, this complex multi-cycle predictor can be used as a component of a fast delay-sensitive predictor. We introduce a novel cooperative branch predictor that off-loads most of the prediction work to the compiler with profiling. The compiler communicates profiled information to the microprocessor using extensions to the instruction set. This Boolean formula predictor has a small and fast hardware implementation, and will work in less than one cycle in even the smallest technologies with the most aggressive projected clock rates. Finally, we present another cooperative technique, branch path re-aliasing, that moves complexity off of the critical path for making a prediction and into the compiler; this technique increases accuracy by reducing destructive aliasing during the less critical update stage.

[1]  Journal of the Association for Computing Machinery , 1961, Nature.

[2]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[3]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[4]  Robert Sims,et al.  Alpha architecture reference manual , 1992 .

[5]  James R. Larus,et al.  Branch prediction for free , 1993, PLDI '93.

[6]  Yale N. Patt,et al.  Author retrospective for increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 2014, ICS 25th Anniversary.

[7]  Yale N. Patt,et al.  A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[8]  Marek Karpinski,et al.  Learning read-once formulas with queries , 1993, JACM.

[9]  S. McFarling Combining Branch Predictors , 1993 .

[10]  Yale N. Patt,et al.  Branch classification: a new mechanism for improving branch predictor performance , 1994, MICRO.

[11]  Arun D. Kulkarni Artificial neural networks for image understanding , 1994, VNR computer library.

[12]  Yale N. Patt,et al.  Branch Classification: A New Mechanism for Improving Branch Predictor Performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[14]  D. Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[15]  Dirk Grunwald,et al.  Reducing branch costs via branch alignment , 1994, ASPLOS VI.

[16]  Dirk Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[17]  Michael D. Smith,et al.  Improving the accuracy of static branch prediction using branch correlation , 1994, ASPLOS VI.

[18]  Ravi Nair,et al.  Dynamic path-based branch correlation , 1995, MICRO 28.

[19]  Huan Liu,et al.  Understanding Neural Networks via Rule Extraction , 1995, IJCAI.

[20]  Pascal Sainrat,et al.  Multiple-block ahead branch predictors , 1996, ASPLOS VII.

[21]  Y.N. Patt,et al.  Using Hybrid Branch Predictors to Improve Branch Prediction Accuracy in the Presence of Context Switches , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[23]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24]  Trevor N. Mudge,et al.  Correlation and Aliasing in Dynamic Branch Predictors , 1996, ISCA.

[25]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[26]  A. Seznec,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[27]  Nicholas C. Gloy,et al.  A Language For Describing Predictors And Its Application To Automatic Synthesis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[28]  Dirk Grunwald,et al.  Evidence-based static branch prediction using machine learning , 1997, TOPL.

[29]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[30]  Sanjay J. Patel,et al.  Critical Issues Regarding the Trace Cache Fetch Mechanism , 1997 .

[31]  David I. August,et al.  Architectural support for compiler-synthesized dynamic branch prediction strategies: Rationale and initial results , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[32]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[33]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.

[34]  Karel Driesen,et al.  The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[35]  Yale N. Patt,et al.  Variable length path branch prediction , 1998, ASPLOS VIII.

[36]  Yale N. Patt,et al.  An analysis of correlation and predictability: what makes two-level branch predictors work , 1998, ISCA.

[37]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[38]  D. Jimenez,et al.  Dynamically weighted ensemble neural networks for classification , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[39]  Michael D. Smith,et al.  Path-based compilation , 1998 .

[40]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[41]  Dirk Grunwald,et al.  Static methods in branch prediction , 1998 .

[42]  Y. Patt,et al.  Variable Length Path Branch Prediction , 1998, ASPLOS.

[43]  Trevor N. Mudge,et al.  The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[44]  Glenn Reinman,et al.  A scalable front-end architecture for fast instruction delivery , 1999, ISCA.

[45]  Jun Xu,et al.  Caching and predicting branch sequences for improved fetch effectiveness , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[46]  Rep Ort,et al.  IA-64: A Parallel Instruction Set: 5/31/99 , 1999 .

[47]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[48]  Daniel A. Jiménez,et al.  The impact of delay on the design of branch predictors , 2000, MICRO 33.

[49]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[50]  Yale N. Patt,et al.  Improving branch prediction by understanding branch behavior , 2000 .

[51]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[52]  Harish Patil,et al.  Combining static and dynamic branch prediction to reduce destructive aliasing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[53]  Mateo Valero,et al.  The effect of code reordering on branch prediction , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[54]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[55]  L.C. Tsai A 1 GHz PA-RISC processor , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[56]  Daniel A. Jiménez,et al.  Boolean formula-based branch prediction for future technologies , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[57]  Calvin Lin,et al.  Perceptron learning for predicting the behavior of conditional branches , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[58]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[59]  Ken Mai,et al.  The future of wires , 2001, Proc. IEEE.