A hierarchical neural model of data prefetching

This paper presents Voyager, a novel neural network for data prefetching. Unlike previous neural models for prefetching, which are limited to learning delta correlations, our model can also learn address correlations, which are important for prefetching irregular sequences of memory accesses. The key to our solution is its hierarchical structure that separates addresses into pages and offsets and that introduces a mechanism for learning important relations among pages and offsets. Voyager provides significant prediction benefits over current data prefetchers. For a set of irregular programs from the SPEC 2006 and GAP benchmark suites, Voyager sees an average IPC improvement of 41.6% over a system with no prefetcher, compared with 21.7% and 28.2%, respectively, for idealized Domino and ISB prefetchers. We also find that for two commercial workloads for which current data prefetchers see very little benefit, Voyager dramatically improves both accuracy and coverage. At present, slow training and prediction preclude neural models from being practically used in hardware, but Voyager’s overheads are significantly lower—in every dimension—than those of previous neural models. For example, computation cost is reduced by 15- 20×, and storage overhead is reduced by 110-200×. Thus, Voyager represents a significant step towards a practical neural prefetcher.

[1]  Trishul M. Chilimbi Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[5]  Zhe Wang,et al.  Perceptron learning for reuse prediction , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Viktor K. Prasanna,et al.  Predicting memory accesses: the road to compact ML-driven prefetcher , 2019, MEMSYS.

[7]  Douglas J. Joseph,et al.  Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[9]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[10]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[11]  Wei-Fen Lin,et al.  Filtering superfluous prefetches using density vectors , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[12]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Samira Manabi Khan,et al.  Sampling Dead Block Prediction for Last-Level Caches , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Margaret Martonosi,et al.  TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[15]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[16]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[17]  Brad Calder,et al.  A Decoupled Predictor-Directed Stream Prefetching Architecture , 2003, IEEE Trans. Computers.

[18]  Calvin Lin,et al.  Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[19]  Pierre Michaud Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Hong Wang,et al.  Improving Branch Prediction By Modeling Global History with Convolutional Neural Networks , 2019, ArXiv.

[22]  Jinchun Kim,et al.  Path confidence based lookahead prefetching , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[24]  Seth H. Pugsley,et al.  Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Norman P. Jouppi,et al.  Memory-System Design Considerations for Dynamically-Scheduled Processors , 1997, ISCA.

[26]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[27]  Calvin Lin,et al.  Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[31]  Akanksha Jain,et al.  Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[32]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[33]  Hao Wu,et al.  Efficient Metadata Management for Irregular Data Prefetching , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[34]  Uri C. Weiser,et al.  Semantic locality and context-based prefetching using reinforcement learning , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[35]  Leeor Peled,et al.  A Neural Network Prefetcher for Arbitrary Memory Access Patterns , 2019, ACM Trans. Archit. Code Optim..

[36]  Yuan Chou,et al.  Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[37]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[38]  Yale N. Patt,et al.  BranchNet: A Convolutional Neural Network to Predict Hard-To-Predict Branches , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[39]  Aamer Jaleel,et al.  Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[42]  Thomas F. Wenisch,et al.  Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[43]  Josep Torrellas,et al.  Using a user-level memory thread for correlation prefetching , 2002, ISCA.

[44]  Kei Hiraki,et al.  Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.

[45]  Babak Falsafi,et al.  Accurate and complexity-effective spatial pattern prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[46]  Daniel A. Jiménez,et al.  Multiperspective Reuse Prediction , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[47]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[48]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[49]  Hamid Sarbazi-Azad,et al.  Domino Temporal Data Prefetcher , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[50]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[51]  Dam Sunwoo,et al.  Temporal Prefetching Without the Off-Chip Metadata , 2019, MICRO.

[52]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[53]  Hamid Sarbazi-Azad,et al.  Bingo Spatial Data Prefetcher , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Daniel A. Jiménez,et al.  Multiperspective Perceptron Predictor with TAGE , 2016 .

[56]  David A. Patterson,et al.  The GAP Benchmark Suite , 2015, ArXiv.

[57]  Thomas F. Wenisch,et al.  Temporal streams in commercial server applications , 2008, 2008 IEEE International Symposium on Workload Characterization.

[58]  Calvin Lin,et al.  Applying Deep Learning to the Cache Replacement Problem , 2019, MICRO.

[59]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.