Cache Replacement Policy Based on Expected Hit Count

Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose significant performance potential. One of the components of a modern processor that has a prominent influence on the off-chip miss traffic is LLC's replacement policy. Existing processors employ a variation of least recently used (LRU) policy to determine the victim for replacement. Unfortunately, there is a large gap between what LRU offers and that of Belady's MIN, which is the optimal replacement policy. Belady's MIN requires selecting a victim with the longest reuse distance, and hence, is unfeasible due to the need for knowing the future. In this work, we observe that there exists a strong correlation between the expected number of hits of a cache block and the reciprocal of its reuse distance. Taking advantage of this observation, we improve the efficiency of last-level caches through a low-cost-yet-effective replacement policy. We suggest a hit-count based victim-selection procedure on top of existing low-cost replacement policies to significantly improve the quality of victim selection in last-level caches without commensurate area overhead. Our proposal offers 12.2 percent performance improvement over the baseline LRU in a multi-core processor and outperforms EVA, which is the state-of-the-art replacement policy.

[1]  Mateo Valero,et al.  Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[3]  Daniel Sánchez,et al.  Maximizing Cache Performance Under Uncertainty , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4]  Daniel Sánchez,et al.  Modeling cache performance beyond LRU , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  Babak Falsafi,et al.  Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.

[6]  Babak Falsafi,et al.  Using dead blocks as a virtual victim cache , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[10]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[11]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[12]  Stefanos Kaxiras,et al.  Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.

[13]  Thomas F. Wenisch,et al.  Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Akanksha Jain,et al.  Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[15]  Trishul M. Chilimbi On the stability of temporal data reference profiles , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.