Compression in cache design

Increasing cache capacity via compression enables designers to improve performance of existing designs for small incremental cost, further leveraging the large die area invested in last level caches. This paper explores the compressed cache design space with focus on implementation feasibility. Our compression schemes use companion line pairs -- cache lines whose addresses differ by a single bit -- as candidates for compression. We propose two novel compressed cache organizations: the companion bit remapped cache and the pseudoassociative cache. Our cache organizations use fixed-width physical cache line implementation while providing a variablelength logical cache line organization, without changing the number of sets or ways and with minimal increase in state per tag. We evaluate banked and pairwise schemes as two alternatives for storing compressed companion pairs within a physical cache line. We evaluate companion line prefetching (CLP), a simple yet effective prefetching mechanism that works in conjunction with our compression scheme. CLP is nearly pollution free since it only prefetches lines that are compression candidates. Using a detailed cycle accurate IA-32 simulator, we measure the performance of several third-level compressed cache designs simulating a representative collection of workloads. Our experiments show that our cache compression designs improve IPC for all cache-sensitive workloads, even those with modest data compressibility. The pairwise pseudo-associative compressed cache organization with companion line prefetching is the best configuration, providing a mean IPC improvement of 19% for cache-sensitive workloads, and a best-case IPC improvement of 84%. Finally, our cache designs exhibit negligible overall IPC degradation for cache-insensitive workloads.

[1]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[2]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[3]  Jang-Soo Lee,et al.  Design and evaluation of a selective compressed memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[4]  Ronak Singhal,et al.  Performance Analysis and Validation of the Intel Pentium 4 Processor on 90nm Technology , 2004 .

[5]  Steven K. Reinhardt,et al.  A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[6]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[7]  Steven K. Reinhardt,et al.  A compressed memory hierarchy using an indirect index cache , 2004, WMPI '04.

[8]  A. Agarwal,et al.  Column-associative Caches: A Technique For Reducing The Miss Rate Of Direct-mapped Caches , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[9]  David A. Wood,et al.  Interactions Between Compression and Prefetching in Chip Multiprocessors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[10]  YangJun,et al.  Frequent value locality and value-centric data cache design , 2000 .

[11]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[12]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[13]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[14]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[15]  Rajiv Gupta,et al.  Enabling partial cache line prefetching through data compression , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[16]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[17]  Michael E. Wazlowski,et al.  IBM Memory Expansion Technology (MXT) , 2001, IBM J. Res. Dev..

[18]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[19]  Yannis Smaragdakis,et al.  The Case for Compressed Caching in Virtual Memory Systems , 1999, USENIX Annual Technical Conference, General Track.

[20]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[21]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.