Virtual Exclusion: An architectural approach to reducing leakage energy in caches for multiprocessor systems

This paper proposes virtual exclusion, an architectural technique to reduce leakage energy in the L2 caches for cache-coherent multiprocessor systems. This technique leverages two previously proposed circuits techniques - gated Vdd and drowsy cache, and proposes a low cost, easily implementable scheme for cache-coherent multiprocessor systems. The virtual exclusion scheme saves leakage energy by keeping the data portion of repetitive cache lines off in the large higher level caches while still manages to maintain multi-level Inclusion, an essential property for an efficient implementation of conventional cache coherence protocols. By exploiting the existing state information in the snoop-based cache coherence protocol, there is almost no extra hardware overhead associated with our scheme. In our experiments, the SPLASH-2 multiprocessor benchmark suite was correctly executed under the new Virtual Exclusion policy and showed an up to 72% savings of leakage energy (46% for SMP and 35% for multicore in L2 on average) over a baseline drowsy L2 cache.

[1]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[3]  A. Matsuzawa,et al.  RF-SoC-expectations and required conditions , 2002 .

[4]  Vivek De,et al.  Life is CMOS: why chase the life after? , 2002, DAC '02.

[5]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[6]  Ke Meng,et al.  Process Variation Aware Cache Leakage Management , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[7]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[8]  Nikil D. Dutt,et al.  Analytical models for leakage power estimation of memory array structures , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..

[9]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[10]  Wen-Hann Wang,et al.  On the inclusion properties for multi-level cache hierarchies , 1988, ISCA '88.

[11]  Wen-Hann Wang,et al.  On the Inclusion Properties for Multi-Level Cache Hierarchies , 1988, ISCA.

[12]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[13]  Tom Shanley,et al.  Pentium Processor System Architecture , 1993 .

[14]  Tom Shanley,et al.  Pentium Pro processor system architecture , 1997, PC system architecture series.

[15]  Margaret Martonosi,et al.  Let caches decay: reducing leakage energy via exploitation of cache generational behavior , 2002, TOCS.

[16]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[17]  Nathan L. Binkert,et al.  Network-Oriented Full-System Simulation using M5 , 2003 .

[18]  Zeshan Chishti,et al.  Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.

[19]  Koji Nii,et al.  A low power SRAM using auto-backgate-controlled MT-CMOS , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[20]  BurgerDoug,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002 .