Thesaurus: Efficient Cache Compression via Dynamic Clustering

In this paper, we identify a previously untapped source of compressibility in cache working sets: clusters of cachelines that are similar, but not identical, to one another. To compress the cache, we can then store the "clusteroid" of each cluster together with the (much smaller) "diffs" needed to reconstruct the rest of the cluster. To exploit this opportunity, we propose a hardware-level on-line cacheline clustering mechanism based on locality-sensitive hashing. Our method dynamically forms clusters as they appear in the data access stream and retires them as they disappear from the cache. Our evaluations show that we achieve 2.25× compression on average (and up to 9.9×) on SPEC~CPU~2017 suite and is significantly higher than prior proposals scaled to an iso-silicon budget.

[1]  Xi Chen,et al.  C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Onur Mutlu,et al.  Exploiting compressed block size as an indicator of future reuse , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[3]  Natalie D. Enright Jerger,et al.  The Bunker Cache for spatio-value approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  Somayeh Sardashti,et al.  Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization , 2014, IEEE Micro.

[5]  Darrell D. E. Long,et al.  Duplicate Data Elimination in a SAN File System , 2004, MSST.

[6]  André Seznec,et al.  Dictionary sharing: An efficient cache compression scheme for compressed caches , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Onur Mutlu,et al.  Base-delta-immediate compression: Practical data compression for on-chip caches , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Josep Torrellas,et al.  PageForge: A Near-Memory Content-Aware Page-Merging Architecture , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Somayeh Sardashti,et al.  Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Jun Chen,et al.  FreePDK v2.0: Transitioning VLSI education towards nanometer variation-aware designs , 2009, 2009 IEEE International Conference on Microelectronic Systems Education.

[11]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Windsor W. Hsu,et al.  Duplicate Management for Reference Data , 2004 .

[13]  Per Stenström,et al.  SC2: A statistical compression cache scheme , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[14]  Daniel Sánchez,et al.  Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy , 2019, ASPLOS.

[15]  Jeremy Buhler,et al.  Efficient large-scale sequence comparison by locality-sensitive hashing , 2001, Bioinform..

[16]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[17]  Amin Ghasemazar,et al.  2DCC: Cache Compression in Two Dimensions , 2020, 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Dinesh Manocha,et al.  Fast GPU-based locality sensitive hashing for k-nearest neighbor computation , 2011, GIS.

[19]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[20]  André Seznec,et al.  Zero-content augmented caches , 2009, ICS '09.

[21]  Samira Manabi Khan,et al.  Last-level cache deduplication , 2014, ICS '14.

[22]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[23]  William J. Starke POWER7: IBM's next generation, balanced POWER server chip , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[24]  Per Stenström,et al.  Characterization and exploitation of narrow-width loads: the narrow-width cache approach , 2010, CASES '10.

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[27]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[28]  Mattan Erez,et al.  Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[29]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[30]  Jun Luo,et al.  A PG-LSH Similarity Search Method for Cloud Storage , 2013, 2013 Ninth International Conference on Computational Intelligence and Security.

[31]  Jóakim von Kistowski,et al.  SPEC CPU2017: Next-Generation Compute Benchmark , 2018, ICPE Companion.

[32]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[33]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[34]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[35]  Natalie D. Enright Jerger,et al.  Doppelgänger: A cache for approximate computing , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[36]  Mehdi Ghayoumi,et al.  Local Sensitive Hashing (LSH) and Convolutional Neural Networks (CNNs) for Object Recognition , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[37]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[38]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[39]  Jeff Baxter,et al.  Nahalem-EX CPU architecture , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[40]  Per Stenström,et al.  HyComp: A hybrid cache compression method for selection of data-type-specific compression methods , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[41]  Wei Wang,et al.  I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[42]  Jongman Kim,et al.  ECM: Effective Capacity Maximizer for high-performance compressed caching , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[43]  Alaa R. Alameldeen,et al.  Base-Victim Compression: An Opportunistic Cache Compression Architecture , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[44]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[45]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[46]  David R. Cheriton,et al.  HICAMP: architectural support for efficient concurrency-safe shared structured data access , 2012, ASPLOS XVII.

[47]  Craig T. Jin,et al.  Random projections for scaling machine learning on FPGAs , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[48]  Somayeh Sardashti,et al.  Skewed Compressed Caches , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[49]  Yannis Smaragdakis,et al.  The Case for Compressed Caching in Virtual Memory Systems , 1999, USENIX Annual Technical Conference, General Track.

[50]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[51]  Anshumali Shrivastava,et al.  Scalable and Sustainable Deep Learning via Randomized Hashing , 2016, KDD.

[52]  David Wentzlaff,et al.  MORC: A manycore-oriented compressed cache , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[53]  Yale N. Patt,et al.  The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).