Prefetching Using Markov Predictors

Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs. The Markov prefetcher is distinguished by prefetching multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor.This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be effectively used by the processor. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.

[1]  Thomas Alexander,et al.  Distributed prefetch-buffer/cache design for high performance memory systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[2]  Luddy Harrison Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.

[3]  Sharad Mehrotra,et al.  Data prefetch mechanisms for accelerating symbolic and numeric computation , 1996 .

[4]  Mikko H. Lipasti,et al.  SPAID: software prefetching in pointer- and call-intensive environments , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[5]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[6]  J. Torrellas,et al.  Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  T. Ozawa,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[8]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[9]  N. Jouppi,et al.  Complexity/performance tradeoffs with non-blocking loads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[10]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[11]  Zarka Cvetanovic,et al.  Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[12]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[13]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[14]  H. Levy,et al.  An architecture for software-controlled data prefetching , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[15]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[16]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[17]  Jean-Loup Baer,et al.  Dynamic Improvement of Locality in Virtual Memory Systems , 1976, IEEE Transactions on Software Engineering.