Fast Speculative Address Generation and Way Caching for Reducing L1 Data Cache Energy

L1 data caches in high-performance processors continue to grow in set associativity. Higher associativity can significantly increase the cache energy consumption. Cache access latency can be affected as well, leading to an increase in overall energy consumption due to increased execution time. At the same time, the static energy consumption of the cache increases significantly with each new process generation. This paper proposes a new approach to reduce the overall L1 cache energy consumption using a combination of way caching and fast, speculative address generation. A 16-entry way cache storing a 3-bit way number for recently accessed L1 data cache lines is shown sufficient to significantly reduce both static and dynamic energy consumption of the L1 cache. Fast speculative address generation helps to hide the way cache access latency and is highly accurate. The L1 cache energy-delay product is reduced by 10% compared to using the way cache alone and by 37% compared to the use of multiple MRU technique.

[1]  Rajesh K. Gupta,et al.  Simultaneous Way footprint Prediction and Branch Prediction for Energy Savings in Set associative Instruction Caches , 2001 .

[2]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[3]  Shekhar Y. Borkar,et al.  Low power design challenges for the decade (invited talk) , 2001, ASP-DAC '01.

[4]  Alexander V. Veidenbaum,et al.  Reducing power consumption for high-associativity data caches in embedded processors , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[5]  D. Plass,et al.  A 5.6GHz 64kB Dual-Read Data Cache for the POWER6TM Processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[6]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[7]  Cameron McNairy,et al.  Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.

[8]  Krste Asanovic,et al.  Direct addressed caches for reduced power consumption , 2001, MICRO.

[9]  Alexander V. Veidenbaum,et al.  Low energy, highly-associative cache design for embedded processors , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[10]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[11]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[12]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[13]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[14]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[15]  Gary S. Tyson,et al.  Active Management of Data Caches by Exploiting Reuse Information , 1999, IEEE Trans. Computers.

[16]  Kevin Skadron,et al.  State-preserving vs. non-state-preserving leakage control in caches , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  Wen-Ben Jone,et al.  Location cache: a low-power L2 cache system , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[18]  David R. Kaeli,et al.  Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[19]  Philippe Roussel,et al.  The microarchitecture of the intel pentium 4 processor on 90nm technology , 2004 .

[20]  Mateo Valero,et al.  A content aware integer register file organization , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..