Let caches decay: reducing leakage energy via exploitation of cache generational behavior

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for highend servers. Although the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly. This article examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce L1 cache leakage energy by 4x in SPEC2000 applications without having an impact on performance. Because our decay-based techniques have notions of competitive online algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce L1 cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

[1]  Kunle Olukotun,et al.  Designing High Bandwidth On-Chip Caches , 1997, ISCA.

[2]  Yale N. Patt,et al.  A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5]  William J. Bowhill,et al.  Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU , 1995, Digit. Tech. J..

[6]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[7]  Brian N. Bershad,et al.  Reducing TLB and memory overhead using online superpage promotion , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[8]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Carla Schlatter Ellis,et al.  Power aware page allocation , 2000, SIGP.

[10]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  G. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[12]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[13]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[14]  M. Malik,et al.  Operating Systems , 1992, Lecture Notes in Computer Science.

[15]  Anna R. Karlin,et al.  Near-Optimal Parallel Prefetching and Caching , 2000, SIAM J. Comput..

[16]  K. Ghose,et al.  Analytical energy dissipation models for low power caches , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[17]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[18]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[19]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[20]  James R. Goodman,et al.  The declining effectiveness of dynamic caching for general- purpose microprocessors , 1995 .

[21]  Mark C. Johnson,et al.  Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks , 1998, ISLPED '98.

[22]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[23]  B. Falsafi,et al.  Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  G. Sohi,et al.  A static power model for architects , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[25]  Jih-Kwon Peir,et al.  Capturing dynamic memory reference behavior with adaptive cache topology , 1998, ASPLOS VIII.

[26]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[27]  P. Gargini,et al.  The International Technology Roadmap for Semiconductors (ITRS): "Past, present and future" , 2000, GaAs IC Symposium. IEEE Gallium Arsenide Integrated Circuits Symposium. 22nd Annual Technical Digest 2000. (Cat. No.00CH37084).

[28]  David A. Wood,et al.  Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[29]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[30]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .