Word-interleaved cache: an energy efficient data cache architecture

We propose a novel energy-efficient data cache architecture, namely, word-interleaved (WI) cache. In theWI cache, a cache block is distributed uniformly among the different cache ways and each line of a cache way holds some words of the block. This distribution provides an opportunity to activate/deactivate the cache ways based on the requested address's offset, thus minimizing the overall cache access energy. For a 4-way set associative cache of size 16KB and blocksize 32B, the proposed technique accomplishes dynamic energy savings of 54.2% without considering fast hits and 62.3% when fast hits are considered, with small performance degradation and negligible area overhead.

[1]  Masahiro Nomura,et al.  A 500 MHz 32b 0.4 /spl mu/m CMOS RISC processor LSI , 1994, Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94.

[2]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[3]  Ge Zhang,et al.  Reducing cache energy consumption by tag encoding in embedded processors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[4]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[5]  Alexander V. Veidenbaum,et al.  Reducing data cache energy consumption via cached load/store queue , 2003, ISLPED '03.

[6]  T. N. Vijaykumar,et al.  Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Shin-Dug Kim,et al.  Power-aware deterministic block allocation for low-power way-selective cache structure , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[10]  A. Nicolau,et al.  Reducing data cache energy consumption via cached load/store queue , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[11]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[13]  Feipei Lai,et al.  Dynamic Zero-Sensitivity Scheme for Low-Power Cache Memories , 2005, IEEE Micro.

[14]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[15]  Peter Petrov,et al.  Tag compression for low power in dynamically customizable embedded processors , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[17]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[18]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[19]  Frank Vahid,et al.  A Way-Halting Cache for Low-Energy High-Performance Systems , 2005, IEEE Computer Architecture Letters.

[20]  Xiaodong Zhang,et al.  Access-Mode Predictions for Low-Power Cache Design , 2002, IEEE Micro.

[21]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[22]  Glenn Reinman,et al.  Reducing energy and delay using efficient victim caches , 2003, ISLPED '03.

[23]  Zhang Mingming,et al.  Reducing cache energy consumption by tag encoding in embedded processors , 2007, ISLPED 2007.

[24]  Antonio González,et al.  Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor , 2002, MICRO 35.

[25]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[26]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[27]  Michael C. Huang,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[28]  Shanq-Jang Ruan,et al.  Design and analysis of low-power cache using two-level filter scheme , 2003, IEEE Trans. Very Large Scale Integr. Syst..