LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm

The design of an effective last-level cache (LLC) in general-and an effective cache replacement/partitioning algorithm in particular-is critical to the overall system performance. The processor's ability to hide the LLC miss penalty differs widely from one miss to another. The more instructions the processor manages to issue during the miss, the better it is capable of hiding the miss penalty and the lower the cost of that miss. This nonuniformity in the processor's ability to hide LLC miss latencies, and the resultant nonuniformity in the performance impact of LLC misses, opens up an opportunity for a new cost-sensitive cache replacement algorithm. This paper makes two key contributions. First, It proposes a framework for estimating the costs of cache blocks at run-time based on the processor's ability to (partially) hide their miss latencies. Second, It proposes a simple, low-hardware overhead, yet effective, cache replacement algorithm that is locality-aware and cost-sensitive (LACS). LACS is thoroughly evaluated using a detailed simulation environment. LACS speeds up 12 LLC-performance-constrained SPEC CPU2006 benchmarks by up to 51% and 11% on average. When evaluated using a dual/quad-core CMP with a shared LLC, LACS significantly outperforms LRU in terms of performance and fairness, achieving improvements up to 54%.

[1]  Sandy Irani,et al.  Cost-Aware WWW Proxy Caching Algorithms , 1997, USENIX Symposium on Internet Technologies and Systems.

[2]  Michel Dubois,et al.  Optimal replacements in caches with two miss costs , 1999, SPAA '99.

[3]  Daniel A. Jimenez Dead Block Replacement and Bypass with a Sampling Predictor , 2010 .

[4]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[5]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[6]  Jaejin Lee,et al.  Eliminating conflict misses using prime number-based cache indexing , 2005, IEEE Transactions on Computers.

[7]  Vijayalakshmi Srinivasan,et al.  Analyzing the Cost of a Cache Miss Using Pipeline Spectroscopy , 2008, J. Instr. Level Parallelism.

[8]  Mazen Kharbutli,et al.  Improving cache performance by combining cost-sensitivity and locality principles in cache replacement algorithms , 2010, 2010 IEEE International Conference on Computer Design.

[9]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[10]  Yan Solihin,et al.  Counter-based cache replacement algorithms , 2005, 2005 International Conference on Computer Design.

[11]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Yan Solihin,et al.  Evaluating placement policies for managing capacity sharing in CMP architectures with private caches , 2011, TACO.

[13]  Neal Young,et al.  The K-Server Dual and Loose Competitiveness for Paging , 1991, On-Line Algorithms.

[14]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Jose Renau,et al.  CAVA: Using checkpoint-assisted value prediction to hide L2 misses , 2006, TACO.

[16]  Michel Dubois,et al.  Cache replacement algorithms with nonuniform miss costs , 2006, IEEE Transactions on Computers.

[17]  M. Martonosi,et al.  Timekeeping in the memory system: predicting and optimizing memory behavior , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[18]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Michel Dubois,et al.  Cost-sensitive cache replacement algorithms , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[20]  Mainak Chaudhuri,et al.  Pseudo-LIFO: The foundation of a new family of replacement policies for last-level caches , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[22]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[23]  Michael F. P. O'Boyle,et al.  IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.

[24]  Chris Wilkerson,et al.  Locality vs. criticality , 2001, ISCA 2001.

[25]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[26]  Maurice V. Wilkes,et al.  The memory gap and the future of high performance memories , 2001, CARN.

[27]  Yale N. Patt,et al.  Utility-Based Cache Partitioning , 2006 .

[28]  Onur Mutlu,et al.  A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[29]  Gabriel H. Loh,et al.  PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.

[30]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[31]  Michel Dubois,et al.  Simple Penalty-Sensitive Cache Replacement Policies , 2008, J. Instr. Level Parallelism.

[32]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[33]  Jean-Loup Baer,et al.  Modified LRU policies for improving second-level cache behavior , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[34]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..