Utility Aware Snoozy Caches for Energy Efficient Chip Multi-Processors

Heavy leakage power consumption of on-chip last level caches (LLCs) has become the primary obstacle for architecting chip multi-processors (CMPs) in recent times. As leakage power has a direct relationship with the supply voltage, hence, periodic access profile based dynamic voltage scaling (DVS) in the LLC banks can be a promising option towards reducing this heavy cache leakage. A plethora of prior attempts have reduced this by anticipating working set size (WSS) of the applications and eventually putting some portions of the cache banks in low power mode. This proposed work aims to reduce leakage by putting a whole LLC bank into a low power (snoozy) mode through exploiting DVS at cache banks having minimal usages. Additionally, the resulting performance impacts of the low power snoozy mode are alleviated further by putting some snoozy banks in active mode on-demand. Experimental evaluations using full system simulation on a multi-banked 2MB 8-way set associative L2 cache show 10% more leakage savings on an average over a prior drowsy technique.

[1]  Csaba Andras Moritz,et al.  Energy-Efficient Hardware Data Prefetching , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[3]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[4]  Alessandro Bardine,et al.  Analysis of static and dynamic energy consumption in NUCA caches: initial results , 2007, MEDEA '07.

[5]  Ann Gordon-Ross,et al.  On the interplay of loop caching, code compression, and cache configuration , 2011, 16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011).

[6]  Shirshendu Das,et al.  Static energy reduction by performance linked cache capacity management in tiled CMPs , 2015, SAC.

[7]  Julio Sahuquillo,et al.  Drowsy cache partitioning for reduced static and dynamic energy in the cache hierarchy , 2013, 2013 International Green Computing Conference Proceedings.

[8]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[9]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[11]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[12]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[13]  Sparsh Mittal,et al.  A survey of architectural techniques for improving cache power efficiency , 2014, Sustain. Comput. Informatics Syst..

[14]  Massimo Poncino,et al.  Tag Overflow Buffering: Reducing Total Memory Energy by Reduced-Tag Matching , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[16]  Stefanos Kaxiras,et al.  Applying Decay to Reduce Dynamic Power in Set-Associative Caches , 2007, HiPEAC.

[17]  Bharadwaj Amrutur,et al.  Adaptive Power Optimization of On-chip SNUCA Cache on Tiled Chip Multicore Architecture Using Remap Policy , 2011, 2011 Second Workshop on Architecture and Multi-Core Applications (wamca 2011).

[18]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[19]  N. Muralimanohar,et al.  CACTI 6 . 0 : A Tool to Understand Large Caches , 2007 .

[20]  Ann Gordon-Ross,et al.  A survey on cache tuning from a power/energy perspective , 2013, CSUR.