Dictionary sharing: An efficient cache compression scheme for compressed caches

The effectiveness of a compressed cache depends on three features: i) the compression scheme, ii) the compaction scheme, and iii) the cache layout of the compressed cache. Skewed compressed cache (SCC) and yet another compressed cache (YACC) are two recently proposed compressed cache layouts that feature minimal storage and latency overheads, while achieving comparable performance over more complex compressed cache layouts. Both SCC and YACC use compression techniques to compress individual cache blocks, and then a compaction technique to compact multiple contiguous compressed blocks into a single data entry. The primary attribute used by these techniques for compaction is the compression factor of the cache blocks, and in this process, they waste cache space. In this paper, we propose dictionary sharing (DISH), a dictionary based cache compression scheme that reduces this wastage. DISH compresses a cache block by keeping in mind that the block is a potential candidate for the compaction process. DISH encodes a cache block with a dictionary that stores the distinct 4-byte chunks of a cache block and the dictionary is shared among multiple neighboring cache blocks. The simple encoding scheme of DISH also provides a single cycle decompression latency and it does not change the cache layout of compressed caches. Compressed cache layouts that use DISH outperforms the compression schemes, such as BDI and CPACK+Z, in terms of compression ratio (from 1.7X to 2.3X), system performance (from 7.2% to 14.3%), and energy efficiency (from 6% to 16%).

[1]  Samira Manabi Khan,et al.  Last-level cache deduplication , 2014, ICS '14.

[2]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Somayeh Sardashti,et al.  Yet Another Compressed Cache , 2016, ACM Trans. Archit. Code Optim..

[4]  Xi Chen,et al.  C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[5]  Cloyce D. Spradling SPEC CPU2006 benchmark tools , 2007, CARN.

[6]  Onur Mutlu,et al.  Exploiting compressed block size as an indicator of future reuse , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  André Seznec,et al.  Zero-content augmented caches , 2009, ICS '09.

[8]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Somayeh Sardashti,et al.  Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Onur Mutlu,et al.  Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[11]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[12]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[13]  R. Govindarajan,et al.  Probabilistic Shared Cache Management (PriSM) , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[14]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[15]  Somayeh Sardashti,et al.  Skewed Compressed Caches , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[16]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[17]  Per Stenström,et al.  SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[18]  Per Stenström,et al.  HyComp: A hybrid cache compression method for selection of data-type-specific compression methods , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Jongman Kim,et al.  ECM: Effective Capacity Maximizer for high-performance compressed caching , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[20]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21]  Omer Khan,et al.  CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[22]  Christoforos E. Kozyrakis,et al.  Vantage: Scalable and efficient fine-grain cache partitioning , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[23]  N. Muralimanohar,et al.  CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[24]  A. Snavely,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[25]  Aamer Jaleel,et al.  High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.

[26]  Reena Panda,et al.  B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..