Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

As transistors keep shrinking and on-chip caches keep growing, static power dissipation resulting from leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by turning off unused cache lines. However, they all have to pay the price of performance degradation. This paper presents a cache architecture, the snug set-associative (SSA) cache, that cuts most of static power dissipation of caches without incuring performance penalties. The SSA cache reduces leakage power by implementing the minimum set-associative scheme, which only activates the minimal numbers of ways in each cache set, while the performance losses caused by this scheme are compensated by the base-offset load/store queues. The rationale of combining these two techniques is locality: as the contents of the cache blocks in the current working set are repeatedly accessed, same addresses would be computed again and again. The SSA cache architecture can be applied to data and instruction caches to reduce leakage power without incurring performance penalties. Experimental results show that SSA can cut static power consumption of the L1 data cache by 93%, on average, for SPECint2000 benchmarks, while the execution times are reduced by 5%. Similarly, SSA can cut leakage dissipation of the L1 instruction cache by 92%, on average, and improve performance over 3%. Furthermore, when SSA is adopted for both L1 data and instruction caches, the normalized leakage of L1 data and instruction caches is lowered to 8%, on average, while still accomplishing a 2% reduction in execution times.

[1]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Trevor Mudge,et al.  Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[3]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[4]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[5]  David Blaauw,et al.  Circuit and microarchitectural techniques for reducing cache leakage power , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6]  James R. Goodman,et al.  The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .

[7]  Chia-Lin Yang,et al.  HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[8]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[9]  Mahmut T. Kandemir,et al.  Exploiting program hotspots and code sequentiality for instruction cache leakage management , 2003, ISLPED '03.

[10]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[11]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[12]  Alexander V. Veidenbaum,et al.  Reducing data cache energy consumption via cached load/store queue , 2003, ISLPED '03.

[13]  Frank Vahid,et al.  A Way-Halting Cache for Low-Energy High-Performance Systems , 2005, IEEE Computer Architecture Letters.

[14]  Yan Meng,et al.  On the limits of leakage power reduction in caches , 2005, 11th International Symposium on High-Performance Computer Architecture.

[15]  Wei Zhang,et al.  Static next sub-bank prediction for drowsy instruction cache , 2004, CASES '04.

[16]  Andrew S. Tanenbaum,et al.  Structured Computer Organization , 1976 .

[17]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, MICRO.

[18]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[19]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[20]  Mark Horowitz,et al.  Performance tradeoffs in cache design , 1988, ISCA '88.

[21]  Craig B. Zilles,et al.  Decomposing the load-store queue by function for power reduction and scalability , 2006, IBM J. Res. Dev..

[22]  Eric Rotenberg,et al.  Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[23]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[24]  David Blaauw,et al.  Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, MICRO.

[25]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[26]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[27]  B. Doyle Poly Poly nitride Poly Poly Silicon Silicon Resist Silicon Silicon Silicon nitride Resist OxideOxide Oxide Oxide Silicon poly poly Oxide Oxide , 2002 .

[28]  Lu Peng,et al.  Signature buffer: bridging performance gap between registers and caches , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[29]  Stefanos Kaxiras,et al.  A simple mechanism to adapt leakage-control policies to temperature , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[30]  D. Blaauw,et al.  Single-V/sub DD/ and single-V/sub T/ super-drowsy techniques for low-leakage high-performance instruction caches , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[31]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[32]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[33]  T. N. Vijaykumar,et al.  Reducing Design Complexity of the Load/Store Queue , 2003, MICRO.

[34]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[35]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .