Load Value Prediction via Path-based Address Prediction: Avoiding Mispredictions due to Conflicting Stores

Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this limitation. Many challenges face value prediction, in this work we focus on two of them. Challenge #1: store instructions change the values in memory, rendering the values in the value predictor stale, and resulting in value mispredictions and a retraining penalty. Challenge #2: value mispredictions trigger costly pipeline flushes. To minimize the number of pipeline flushes, value predictors employ stringent, yet necessary, high confidence requirements to guarantee high prediction accuracy. Such requirements can negatively impact training time and coverage. In this work, we propose Decoupled Load Value Prediction (DLVP), a technique that targets the value prediction challenges for load instructions. DLVP mitigates the stale state caused by stores by replacing value prediction with memory address prediction. Then, it opportunistically probes the data cache to retrieve the value(s) corresponding to the predicted address(es) early enough so value prediction can take place. Since the values captured in the data cache mirror the current program data (except for in-flight stores), this addresses the first challenge. Regarding the second challenge, DLVP reduces pipeline flushes by using a new context-based address prediction scheme that leverages load-path history to deliver high address prediction accuracy (over 99%) with relaxed confidence requirements. We call this address prediction scheme Path-based Address Prediction (PAP). With a modest 8KB prediction table, DLVP improves performance by up to 71%, and 4.8% on average, without increasing the core energy consumption. CCS CONCEPTS • Computer systems organization → Superscalar architectures; Pipeline computing; Reduced instruction set computing;

[1]  Craig B. Zilles,et al.  Probabilistic counter updates for predictor hysteresis and stratification , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[2]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[3]  Joel S. Emer,et al.  Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[4]  José González,et al.  Speculative execution via address prediction and data prefetching , 1997, ICS '97.

[5]  Samin Ishtiaq,et al.  Reasoning about the ARM weakly consistent memory model , 2008, MSPC '08.

[6]  Glenn Reinman,et al.  Selective value prediction , 1999, ISCA.

[7]  Thomas M. Conte,et al.  A Benchmark Characterization of the EEMBC Benchmark Suite , 2009, IEEE Micro.

[8]  Victor V. Zyuban,et al.  The energy complexity of register files , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[9]  Uri C. Weiser,et al.  Correlated load-address predictors , 1999, ISCA.

[10]  오진석,et al.  스냅샷을 이용한 V8 자바스크립트 엔진의 컴파일 오버헤드 제거 , 2012 .

[11]  Andreas Moshovos,et al.  Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[12]  Eric Rotenberg,et al.  EXACT: explicit dynamic-branch prediction with active updates , 2010, CF '10.

[13]  André Seznec,et al.  BeBoP: A cost effective predictor infrastructure for superscalar value prediction , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[14]  Todd M. Austin,et al.  Zero-cycle loads: microarchitecture support for reducing load latency , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[15]  Gary S. Tyson,et al.  Memory Renaming: Fast, Early and Accurate Processing of Memory Communication , 1999, International Journal of Parallel Programming.

[16]  André Seznec,et al.  Practical data value speculation for future high-end processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[17]  Andreas Moshovos,et al.  Streamlining inter-operation memory communication via data dependence prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Vicki H. Allan,et al.  Petri net versus module scheduling for software pipelining , 1995, MICRO 1995.

[19]  André Seznec,et al.  A new case for the TAGE branch predictor , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Ravi Nair,et al.  Dynamic path-based branch correlation , 1995, MICRO 28.

[21]  Mikko H. Lipasti,et al.  Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, MICRO.

[22]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[23]  F. Gabbay Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .

[24]  James E. Smith,et al.  Implementations of Context Based Value Predictors , 1997 .

[25]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26]  Paul E. McKenney,et al.  Memory Barriers: a Hardware View for Software Hackers , 2010 .

[27]  André Seznec A 64-Kbytes ITTAGE indirect branch predictor , 2011 .

[28]  James E. Smith,et al.  The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[29]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[30]  Stamatis Vassiliadis,et al.  A load-instruction unit for pipelined processors , 1993, IBM J. Res. Dev..