Best-offset hardware prefetching

Hardware prefetching is an important feature of modern high-performance processors. When the application working set is too large to fit in on-chip caches, disabling hardware pre-fetchers may result in severe performance reduction. A new prefetcher was recently introduced, the Sandbox prefetcher, that tries to find dynamically the best prefetch offset using the sandbox method. The Sandbox prefetcher uses simple hardware and was shown to be quite effective. However, the sandbox method does not take into account prefetch timeliness. We propose an offset prefetcher with a new method for selecting the prefetch offset that takes into account prefetch timeliness. We show that our Best-Offset prefetcher outperforms the Sandbox prefetcher on the SPEC CPU2006 benchmarks, with equally simple hardware.

[1]  Anand Sivasubramaniam,et al.  Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.

[2]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[3]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[4]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Onur Mutlu,et al.  Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Kathryn S. McKinley,et al.  Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.

[7]  Pierre Michaud The 3P and 4P cache replacement policies , 2010 .

[8]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[9]  K. Kavi Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .

[10]  Carole-Jean Wu,et al.  PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[12]  Alexander V. Veidenbaum,et al.  Multiple stream tracker: a new hardware stride prefetcher , 2014, Conf. Computing Frontiers.

[13]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[14]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.

[15]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Calvin Lin,et al.  Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[17]  Erik Hagersten,et al.  Toward Scalable Cache Only Memory Architectures , 2003 .

[18]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO.

[19]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[20]  Kei Hiraki,et al.  Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.

[21]  Calvin Lin,et al.  Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[23]  Onur Mutlu,et al.  Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks , 2014, ACM Trans. Archit. Code Optim..

[24]  Michel Dubois,et al.  International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .

[25]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[26]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[27]  Yuan Chou,et al.  Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[28]  Olivier Temam,et al.  MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[29]  Pierre Michaud,et al.  A case for (partially) TAgged GEometric history length branch prediction , 2006, J. Instr. Level Parallelism.

[30]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[31]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[32]  Aamer Jaleel,et al.  Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[33]  Joel S. Emer,et al.  Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[34]  Ivan Sklenár Prefetch unit for vector operations on scalar computers , 1992, CARN.

[35]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[36]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[37]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.