Selective Re-Execution and its Implications for Value Speculation

In this paper, we describe a lightweight protocol to support selective re-execution on the TRIPS processor. The protocol permits multiple waves of speculation to be traversing a dataflow graph simultaneously and in any order, with a cleanup “commit” wave propagating as well to determine completion of a group of instructions. The protocol is completely distributed, consisting of point-to-point messages and requiring no centralized control. Thus, recovery from value mis-speculations requires no additional fetching or decoding of instructions, and no issue of instructions that were independent of the faulting instruction. We describe briefly one way in which this protocol can be exploited: by allowing every instruction to use a decentralized last value predictor. Our results show that in this scheme up to 26% of all instructions can fire as soon as they are fetched, with 0.001% of instructions firing incorrectly.

[1]  Shubhendu S. Mukherjee,et al.  Using prediction to accelerate coherence protocols , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[2]  Kai Wang,et al.  Highly accurate data value prediction using hybrid predictors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Stefanos Kaxiras,et al.  Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[4]  David A. Wood,et al.  Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[5]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[6]  Stefanos Kaxiras,et al.  Coherence communication prediction in shared-memory multiprocessors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[7]  Andreas Moshovos,et al.  Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[8]  Mikko H. Lipasti,et al.  Temporally silent stores , 2002, ASPLOS X.

[9]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[11]  F. Gabbay,et al.  The effect of instruction fetch bandwidth on value prediction , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[12]  Stéphan Jourdan,et al.  Speculation techniques for improving load related instruction scheduling , 1999, ISCA.

[13]  Glenn Reinman,et al.  Predictive techniques for aggressive load speculation , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Babak Falsafi,et al.  Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.

[15]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[16]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[18]  Martin Burtscher,et al.  Exploring last n value prediction , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[19]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[20]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[21]  Jun Yang,et al.  Load redundancy removal through instruction reuse , 2000, Proceedings 2000 International Conference on Parallel Processing.

[22]  Glenn Reinman,et al.  A Comparative Survey of Load Speculation Architectures , 2000, J. Instr. Level Parallelism.