Using Coherence Information and Decay Techniques to Optimize L2 Cache Leakage in CMPs

This paper evaluates several techniques to save leakage in CMP L2 caches by selectively switching off the less used lines. We primarily focus on private snoopy L2 caches. In this case, coherence must be enforced in all situations and specially when a line is turned off to save power. In particular, we introduce three techniques: the first one turns off the cache lines by using the coherence protocol invalidations, the second one is an implementation of a cache decay technique specific for coherent caches, the third one is a performance-optimized decay-based technique for coherent caches. Experimental results, carried out by using accurate performance/thermal/energy models, show that appreciable power savings can be achieved by properly designing a leakage optimization technique. We target a CMP composed of 4 cores and 1 to 8 MB of total cache. For 4MB, the proposed techniques show a 13%, 30%, and 21% energy reduction, respectively, at the cost of 0%, 8%, and 2% performance loss. For other cache sizes the behavior is qualitatively similar.

[1]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[2]  Margaret Martonosi,et al.  Let caches decay: reducing leakage energy via exploitation of cache generational behavior , 2002, TOCS.

[3]  Yan Meng,et al.  Exploring the limits of leakage power reduction in caches , 2005, TACO.

[4]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[5]  Yen-Kuang Chen,et al.  The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Jian Li,et al.  Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[8]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[12]  Michael F. P. O'Boyle,et al.  IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.

[13]  Wei Zhang,et al.  Compiler-directed instruction cache leakage optimization , 2002, MICRO.

[14]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[15]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[16]  Mahmut T. Kandemir,et al.  Leakage energy management in cache hierarchies , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[17]  Alon Naveh,et al.  Power and Thermal Management in the Intel Core Duo Processor , 2006 .

[18]  Anantha P. Chandrakasan,et al.  Low-Power CMOS Design , 1997 .

[19]  Per Stenström,et al.  Performance and power impact of issue-width in chip-multiprocessor cores , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[20]  D. Blaauw,et al.  Single-V/sub DD/ and single-V/sub T/ super-drowsy techniques for low-leakage high-performance instruction caches , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[21]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[22]  Hsien-Hsin S. Lee,et al.  Virtual Exclusion: An architectural approach to reducing leakage energy in caches for multiprocessor systems , 2007, 2007 International Conference on Parallel and Distributed Systems.

[23]  David A. Wood,et al.  Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[24]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[25]  Yan Meng,et al.  On the limits of leakage power reduction in caches , 2005, 11th International Symposium on High-Performance Computer Architecture.

[26]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[27]  Ramon Canal,et al.  Design space exploration for multicore architectures: a power/performance/thermal view , 2006, ICS '06.

[28]  Norman P. Jouppi,et al.  Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .

[29]  Babak Falsafi,et al.  Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.