论文信息 - Cache Replacement Policy Based on Expected Hit Count

Cache Replacement Policy Based on Expected Hit Count

Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose significant performance potential. One of the components of a modern processor that has a prominent influence on the off-chip miss traffic is LLC's replacement policy. Existing processors employ a variation of least recently used (LRU) policy to determine the victim for replacement. Unfortunately, there is a large gap between what LRU offers and that of Belady's MIN, which is the optimal replacement policy. Belady's MIN requires selecting a victim with the longest reuse distance, and hence, is unfeasible due to the need for knowing the future. In this work, we observe that there exists a strong correlation between the expected number of hits of a cache block and the reciprocal of its reuse distance. Taking advantage of this observation, we improve the efficiency of last-level caches through a low-cost-yet-effective replacement policy. We suggest a hit-count based victim-selection procedure on top of existing low-cost replacement policies to significantly improve the quality of victim selection in last-level caches without commensurate area overhead. Our proposal offers 12.2 percent performance improvement over the baseline LRU in a multi-core processor and outperforms EVA, which is the state-of-the-art replacement policy.

[1] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[2] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[3] Daniel Sánchez,et al. Maximizing Cache Performance Under Uncertainty , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[4] Daniel Sánchez,et al. Modeling cache performance beyond LRU , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5] Babak Falsafi,et al. Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.

[6] Babak Falsafi,et al. Using dead blocks as a virtual victim cache , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7] Carole-Jean Wu,et al. SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9] Babak Falsafi,et al. Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[10] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[11] Jaehyuk Huh,et al. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[12] Stefanos Kaxiras,et al. Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.

[13] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14] Akanksha Jain,et al. Back to the Future: Leveraging Belady's Algorithm for Improved Cache Replacement , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[15] Trishul M. Chilimbi. On the stability of temporal data reference profiles , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.