Multi-level Hardware Prefetching Using Low Complexity Delta Correlating Prediction Tables with Partial Matching

This paper presents a low complexity table-based approach to delta correlation prefetching. Our approach uses a table indexed by the load address which stores the latest deltas observed. By storing deltas rather than full miss addresses, considerable space is saved while making pattern matching easier. The delta-history can predict repeating patterns with long periods by using delta correlation. In addition, we propose L1 hoisting which is a technique for moving data from the L2 to the L1 using the same underlying table structure and partial matching which reduces the spatial resolution in the delta stream to expose more patterns. We evaluate our prefetching technique using the simulator framework used in the Data Prefetching Championship. This allows us to use the original code submitted to the contest to fairly evaluate several alternate prefetching techniques. Our prefetcher technique increases performance by 87% on average (6.6X max) on SPEC2006.

[1]  Víctor Viñals,et al.  Multi-level Adaptive Prefetching based on Performance Gradient Tracking , 2011, J. Instr. Level Parallelism.

[2]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[3]  Thomas F. Wenisch,et al.  Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[5]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[6]  Olivier Temam,et al.  MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Magnus Jahre,et al.  Low-cost open-page prefetch scheduling in chip multiprocessors , 2008, 2008 IEEE International Conference on Computer Design.

[8]  David A. Patterson,et al.  Latency lags bandwith , 2004, CACM.

[9]  Kei Hiraki,et al.  Access map pattern matching for data cache prefetch , 2009, ICS.

[10]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.

[11]  Michael F. P. O'Boyle,et al.  Portable compiler optimisation across embedded programs and microarchitectures using machine learning , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Huiyang Zhou,et al.  Combining Local and Global History for High Performance Data Prefetching , 2011, J. Instr. Level Parallelism.

[13]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[14]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[15]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[16]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[17]  Per Stenström,et al.  Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[18]  Calvin Lin,et al.  Feedback mechanisms for improving probabilistic memory prefetching , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[19]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[20]  Magnus Jahre,et al.  Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables , 2011, J. Instr. Level Parallelism.

[21]  L. Natvig,et al.  Dynamic Parameter Tuning for Hardware Prefetching Using Shadow Tagging , 2008 .