Towards the adoption of Local Branch Predictors in Modern Out-of-Order Superscalar Processors

Branch prediction accuracy plays a dominant role in the performance provided by modern Out-of-Order(OOO) superscalar processors. While global history-based branch predictors are more popular, local history-based predictors offer an additional dimension towards enhancing the overall branch prediction accuracy. Integrating the local predictors in modern cores, though, comes with non-trivial challenges associated with managing the local predictor's state and repairing this state on any branch misprediction is essential for the local predictor to operate effectively. Using a highly accurate, industry standard simulator modeling a Skylake-like OOO core and workloads spanning diverse categories including Server, High Performance Computing (HPC) and personal computing suites, besides SPEC, we methodically highlight the issues that need to be tackled, why local predictor repair is non-trivial and the performance opportunity that is lost when the local predictor repair is not handled efficiently. We discuss the issues with prior techniques and quantify their limitations when using them in current OOO cores. Further, we propose three practical, implementable and efficient repair techniques with minimal storage requirements that provide significant performance gains for local predictors. Unlike prior repair techniques that can only attain 50% of the oracular gains, our realistic repair techniques retain about 80% of the oracular gains resulting in significantly better application performance.

[1]  André Seznec,et al.  TAGE-SC-L Branch Predictors Again , 2016 .

[2]  André Seznec,et al.  Analysis of the O-GEometric history length branch predictor , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[3]  Huiyang Zhou,et al.  Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors , 2005, J. Instr. Level Parallelism.

[4]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  André Seznec,et al.  A new case for the TAGE branch predictor , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Mike Clark,et al.  A new ×86 core architecture for the next generation of computing , 2016, IEEE Hot Chips Symposium.

[7]  Margaret Martonosi,et al.  Speculative Updates of Local and Global Branch History: A Quantitative Analysis , 2000, J. Instr. Level Parallelism.

[8]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[9]  Ronald N. Kalla,et al.  IBM Power9 Processor Architecture , 2017, IEEE Micro.

[10]  Daniel A. Jiménez,et al.  The impact of delay on the design of branch predictors , 2000, MICRO 33.

[11]  Joshua San Miguel,et al.  The inner most loop iteration counter: A new dimension in branch history , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[13]  Daniel A. Jiménez,et al.  Multiperspective Perceptron Predictor with TAGE , 2016 .

[14]  Vijay Janapa Reddi,et al.  Amdahl's Law in Big Data Analytics: Alive and Kicking in TPCx-BB (BigBench) , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[15]  Huiyang Zhou,et al.  Adaptive Information Processing: An Effective Way to Improve Perceptron Branch Predictors , 2006 .

[16]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[17]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[18]  Yasuo Ishii Fused Two-Level Branch Prediction with Ahead Calculation , 2007, J. Instr. Level Parallelism.

[19]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[20]  Koen De Bosschere,et al.  2FAR: A 2bcgskew Predictor Fused by an Alloyed Redundant History Skewed Perceptron Branch Predictor , 2005, J. Instr. Level Parallelism.

[21]  Marcos Dias de Assunção,et al.  Apache Spark , 2019, Encyclopedia of Big Data Technologies.

[22]  S. McFarling Combining Branch Predictors , 1993 .

[23]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.