Path confidence based lookahead prefetching

Designing prefetchers to maximize system performance often requires a delicate balance between coverage and accuracy. Achieving both high coverage and accuracy is particularly challenging in workloads with complex address patterns, which may require large amounts of history to accurately predict future addresses. This paper describes the Signature Path Prefetcher (SPP), which offers effective solutions for three classic challenges in prefetcher design. First, SPP uses a compressed history based scheme that accurately predicts complex address patterns. Second, unlike other history based algorithms, which miss out on many prefetching opportunities when address patterns make a transition between physical pages, SPP tracks complex patterns across physical page boundaries and continues prefetching as soon as they move to new pages. Finally, SPP uses the confidence it has in its predictions to adaptively throttle itself on a per-prefetch stream basis. In our analysis, we find that SPP improves performance by 27.2% over a no-prefetching baseline, and outperforms the state-of-the-art Best Offset prefetcher by 6.4%. SPP does this with minimal overhead, operating strictly in the physical address space, and without requiring any additional processor core state, such as the PC.

[1]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[2]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[3]  Pierre Michaud A Best-Offset Prefetcher , 2015 .

[4]  Pierre Michaud Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Natalie D. Enright Jerger,et al.  Friendly fire: understanding the effects of multiprocessor prefetches , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[6]  Calvin Lin,et al.  Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[7]  Aamer Jaleel,et al.  Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[8]  J.W.C. Fu,et al.  Stride Directed Prefetching In Scalar Processors , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[9]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[10]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[11]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[12]  James Cownie,et al.  PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs , 2010, CGO '10.

[13]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[14]  James E. Smith,et al.  Data Cache Prefetching Using a Global History Buffer , 2005, IEEE Micro.

[15]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.

[16]  Seth H. Pugsley,et al.  Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Kei Hiraki,et al.  Unified memory optimizing architecture: memory subsystem control with a unified predictor , 2012, ICS '12.

[18]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[19]  Kei Hiraki,et al.  Access Map Pattern Matching for High Performance Data Cache Prefetch , 2011, J. Instr. Level Parallelism.

[20]  Reena Panda,et al.  B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[21]  Jan Reineke,et al.  Computer Aided Verification , 2017, Lecture Notes in Computer Science.

[22]  Calvin Lin,et al.  Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[24]  K.J. Nesbit,et al.  AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..