Victim management in a cache hierarchy

We investigate directions for exploiting what might be termed pattern locality in a cache hierarchy, based on recording cache discards or victims. An advantage of storing discard decisions is the reduced duplication of pertinent information, as well as the maintenance of information on the current location of discarded lines. Typical caches are designed to exploit combinations of temporal and spatial locality. Temporal locality, the likelihood that recently referenced data will be referenced again, is exploited by LRU-like algorithms. Spatial locality is the property that causes larger cache lines to yield improved miss ratios. Here we consider the exploitation of pattern locality--the property that lines accessed in temporal proximity tend to be re-referenced together. We describe some new cache structures including pattern-recording features, along with their miss ratio and transfer traffic performance as determined via simulations on traces drawn from several benchmark applications. We show that pattern locality information, based on discard statistics, can be useful in enhancing the quality of prefetch decisions.

[1]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[2]  James R. Goodman,et al.  Hardware techniques to improve the performance of the processor/memory interface , 1998 .

[3]  Wei-Fen Lin,et al.  Reducing DRAM latencies with an integrated memory hierarchy design , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[4]  Vijayalakshmi Srinivasan,et al.  Exploring the limits of prefetching , 2005, IBM J. Res. Dev..

[5]  P. Gregory,et al.  February , 1890, The Hospital.

[6]  Jean-Loup Baer,et al.  Pursuing the performance potential of dynamic cache line sizes , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[7]  Mark J. Charney,et al.  Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[8]  Thomas Alexander,et al.  Distributed prefetch-buffer/cache design for high performance memory systems , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[9]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[10]  Wei-Fen Lin,et al.  Filtering superfluous prefetches using density vectors , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[11]  Haifeng Yu,et al.  DRAM-page based prediction and prefetching , 2000, Proceedings 2000 International Conference on Computer Design.

[12]  Wen-mei W. Hwu,et al.  Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[13]  Margaret Martonosi,et al.  TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[14]  Mikko H. Lipasti,et al.  Improving multiprocessor performance with coarse-grain coherence tracking , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[15]  Brian T. Bennett,et al.  Adaptive Variation of the Transfer Unit in a Storage Hierarchy , 1978, IBM J. Res. Dev..

[16]  Olivier Temam,et al.  An Algorithm for Optimally Exploiting Spatial and Temporal Locality in Upper Memory Levels , 1999, IEEE Trans. Computers.

[17]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[18]  Andreas Moshovos RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence , 2005, ISCA 2005.