ECM: Effective Capacity Maximizer for high-performance compressed caching

Compressed Last-Level Cache (LLC) architectures have been proposed to enhance system performance by efficiently increasing the effective capacity of the cache, without physically increasing the cache size. In a compressed cache, the cacheline size varies depending on the achieved compression ratio. We observe that this size information gives a useful hint when selecting a victim, which can lead to increased cache performance. However, no replacement policy tailored to compressed LLCs has been investigated so far. This paper introduces the notion of size-aware compressed cache management as a way to maximize the performance of compressed caches. Toward this end, the Effective Capacity Maximizer (ECM) scheme is introduced, which targets compressed LLCs. The proposed mechanism revolves around three fundamental principles: Size-Aware Insertion (SAI), a Dynamically Adjustable Threshold Scheme (DATS), and Size-Aware Replacement (SAR). By adjusting the eviction criteria, based on the compressed data size, one may increase the effective cache capacity and minimize the miss penalty. Extensive simulations with memory traces from real applications running on a full-system simulator demonstrate significant improvements compared to compressed cache schemes employing the conventional Least-Recently Used (LRU) and Dynamic Re-Reference Interval Prediction (DRRIP) [11] replacement policies. Specifically, ECM shows an average effective capacity increase of 15% over LRU and 18.8% over DRRIP, an average cache miss reduction of 9.4% over LRU and 3.9% over DRRIP, and an average system performance improvement of 6.2% over LRU and 3.3% over DRRIP.

[1]  Xi Chen,et al.  C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Gabriel H. Loh,et al.  Thread-aware dynamic shared cache compression in multi-core processors , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[3]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[4]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[5]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[6]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Aamer Jaleel,et al.  Adaptive insertion policies for high performance caching , 2007, ISCA '07.

[8]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[9]  Jang-Soo Lee,et al.  An on-chip cache compression technique to reduce decompression overhead and design complexity , 2000, J. Syst. Archit..

[10]  Soontae Kim,et al.  Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Steven K. Reinhardt,et al.  A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Ali-Reza Adl-Tabatabai,et al.  Compression in cache design , 2007, ICS '07.

[13]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Barton P. Miller,et al.  Vulnerability Assessment Enhancement for Middleware for Computing and Informatics , 2013 .

[16]  Steven K. Reinhardt,et al.  A compressed memory hierarchy using an indirect index cache , 2004, WMPI '04.

[17]  Jaehyuk Huh,et al.  Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[18]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[19]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[20]  Carole-Jean Wu,et al.  SHiP: Signature-based Hit Predictor for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  John T. Robinson,et al.  Parallel compression with cooperative dictionary construction , 1996, Proceedings of Data Compression Conference - DCC '96.

[22]  Mineo Takai,et al.  Parssec: A Parallel Simulation Environment for Complex Systems , 1998, Computer.