Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.

[1]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  David A. Wood,et al.  Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches , 2004 .

[3]  André Seznec,et al.  Zero-content augmented caches , 2009, ICS '09.

[4]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[5]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[6]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[7]  Jun Yang,et al.  Frequent value compression in data caches , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[8]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[9]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[10]  Aneesh Aggarwal,et al.  Increasing the cache efficiency by eliminating noise , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[11]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[12]  Xiaowei Shen,et al.  Performance of hardware compressed main memory , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[13]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[14]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[15]  Ali-Reza Adl-Tabatabai,et al.  Compression in cache design , 2007, ICS '07.

[16]  Steven K. Reinhardt,et al.  A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[17]  Tohru Ishihara,et al.  SRAM Leakage Reduction by Row/Column Redundancy Under Random Within-Die Delay Variation , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Yale N. Patt,et al.  Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[19]  Jang-Soo Lee,et al.  An on-chip cache compression technique to reduce decompression overhead and design complexity , 2000, J. Syst. Archit..

[20]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[21]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[22]  M. Ekman,et al.  A robust main-memory compression scheme , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[23]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[24]  Aneesh Aggarwal,et al.  Increasing cache capacity through word filtering , 2007, ICS '07.

[25]  Yannis Smaragdakis,et al.  The Case for Compressed Caching in Virtual Memory Systems , 1999, USENIX Annual Technical Conference, General Track.

[26]  Jun Yang,et al.  Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[27]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[28]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[29]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[30]  Aneesh Aggarwal,et al.  Restrictive compression techniques to increase level 1 cache capacity , 2005, 2005 International Conference on Computer Design.

[31]  Glenn Reinman,et al.  Just say no: benefits of early cache miss determination , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[32]  Xi Chen,et al.  C-Pack: A High-Performance Microprocessor Cache Compression Algorithm , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.