EXACT: explicit dynamic-branch prediction with active updates

Branches that depend directly or indirectly on load instructions are a leading cause of mispredictions by state-of-the-art branch predictors. For a branch of this type, there is a unique dynamic instance of the branch for each unique combination of producer-load addresses. Based on this definition, a study of mispredictions reveals two related problems: (i) Global branch history often fails to distinguish between different dynamic branches. In this case, the predictor is unable to specialize predictions for different dynamic branches, causing mispredictions if their outcomes differ. Ideally, the remedy is to predict a dynamic branch using its program counter (PC) and the addresses of its producer loads, since this context uniquely identifies the dynamic branch. We call this context the identity, or ID, of the dynamic branch. In general, producer loads are unlikely to have generated their addresses when the dynamic branch is fetched. We show that the ID of a distant retired branch in the global branch stream combined with recent global branch history, is effective context for predicting the current branch. (ii) Fixing the first problem exposes another problem. A store to an address on which a dynamic branch depends may flip its outcome when it is next encountered. With conventional passive updates, the branch suffers a misprediction before the predictor is retrained. We propose that stores to the memory addresses on which a dynamic branch depends, directly update its prediction in the predictor. This novel "active update" concept avoids mispredictions that are otherwise incurred by conventional passive training. We highlight two practical features that enable large EXACT predictors: the prediction path is scalably pipelinable by virtue of its decoupled indexing strategy, and active updates are tolerant of 100s of cycles of latency making it ideal for virtualizing this component in the general-purpose memory hierarchy. We also present a compact form of the predictor that caches only dynamic instances of a static branch that differ from its overall bias.

[1]  Olivier Temam,et al.  Dataflow analysis of branch mispredictions and its application to early resolution of branch outcomes , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Yiannakis Sazeides,et al.  The Significance of Affectors and Affectees Correlations for Branch Prediction , 2008, HiPEAC.

[3]  Gurindar S. Sohi,et al.  Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[4]  James E. Smith,et al.  The microarchitecture of superscalar processors , 1995, Proc. IEEE.

[5]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[6]  Norman P. Jouppi,et al.  Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  James E. Smith,et al.  Improving branch predictors by correlating on data values , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Haitham Akkary,et al.  Perceptron-Based Branch Confidence Estimation , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[9]  Amir Roth,et al.  DISE: a programmable macro engine for customizing applications , 2003, ISCA '03.

[10]  Lei Chen,et al.  Dynamic data dependence tracking and its application to branch prediction , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[11]  Haitham Akkary,et al.  Checkpoint processing and recovery: towards scalable large instruction window processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[12]  A. Seznec,et al.  Trading Conflict And Capacity Aliasing In Conditional Branch Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Avi Mendelson,et al.  The effect of instruction fetch bandwidth on value prediction , 1998, ISCA.

[14]  MendelsonAvi,et al.  The effect of instruction fetch bandwidth on value prediction , 1998 .

[15]  ValeroMateo,et al.  Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors , 2006 .

[16]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[17]  Daniel A. Jiménez,et al.  The impact of delay on the design of branch predictors , 2000, MICRO 33.

[18]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19]  Daniel A. Jiménez,et al.  Reconsidering complex branch predictors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[20]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[21]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[22]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[23]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[24]  Daniel A. Jiménez,et al.  Low-power, high-performance analog neural branch prediction , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[25]  Yi Ma,et al.  Address-branch correlation: A novel locality for long-latency hard-to-predict branches , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[26]  José González,et al.  Control-flow speculation through value prediction for superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[27]  Dana S. Henry,et al.  Predicting conditional branches with fusion-based hybrid predictors , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[28]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[29]  Haitham Akkary,et al.  Continual flow pipelines , 2004, ASPLOS XI.

[30]  Pierre Michaud,et al.  A PPM-like, Tag-based Predictor. , 2005 .

[31]  Scott A. Mahlke,et al.  Compiler synthesized dynamic branch prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[32]  André Seznec,et al.  The Idealistic GTL Predictor , 2007, J. Instr. Level Parallelism.

[33]  Eric Rotenberg,et al.  Architectural Contesting , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[34]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.

[35]  André Seznec,et al.  Effective ahead pipelining of instruction block address generation , 2003, ISCA '03.

[36]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[37]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[38]  Chris Wilkerson,et al.  Improving branch prediction by dynamic dataflow-based identification of correlated branches from a large global history , 2003, ISCA '03.

[39]  Onur Mutlu,et al.  Accelerating critical section execution with asymmetric multi-core architectures , 2009, ASPLOS.

[40]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[41]  Babak Falsafi,et al.  Predictor virtualization , 2008, ASPLOS.

[42]  André Seznec,et al.  The L-TAGE Branch Predictor , 2007, J. Instr. Level Parallelism.

[43]  Yale N. Patt,et al.  Improving branch prediction accuracy by reducing pattern history table interference , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[44]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[45]  Yale N. Patt,et al.  Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.

[46]  S. McFarling Combining Branch Predictors , 1993 .

[47]  Mateo Valero,et al.  Prophet/critic hybrid branch prediction , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..