论文信息 - EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors

EECache: A Comprehensive Study on the Architectural Design for Energy-Efficient Last-Level Caches in Chip Multiprocessors

Power management for large last-level caches (LLCs) is important in chip multiprocessors (CMPs), as the leakage power of LLCs accounts for a significant fraction of the limited on-chip power budget. Since not all workloads running on CMPs need the entire cache, portions of a large, shared LLC can be disabled to save energy. In this article, we explore different design choices, from circuit-level cache organization to microarchitectural management policies, to propose a low-overhead runtime mechanism for energy reduction in the large, shared LLC. We first introduce a slice-based cache organization that can shut down parts of the shared LLC with minimal circuit overhead. Based on this slice-based organization, part of the shared LLC can be turned off according to the spatial and temporal cache access behavior captured by low-overhead sampling-based hardware. In order to eliminate the performance penalties caused by flushing data before powering off a cache slice, we propose data migration policies to prevent the loss of useful data in the LLC. Results show that our energy-efficient cache design (EECache) provides 14.1% energy savings at only 1.2% performance degradation and consumes negligible hardware overhead compared to prior work.

[1] Jason Cong,et al. An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[2] E. Alon,et al. The implementation of a 2-core, multi-threaded itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[3] Balaram Sinharoy,et al. The implementation of POWER7TM: A highly parallel and scalable multi-core high-end server processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[4] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .

[5] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6] Jie S. Hu,et al. Optimizing the thermal behavior of subarrayed data caches , 2005, 2005 International Conference on Computer Design.

[7] Jason M. Allred,et al. Designing for dark silicon: a methodological perspective on energy efficient systems , 2012, ISLPED '12.

[8] Sunggu Lee,et al. A novel tag access scheme for low power L2 cache , 2011, 2011 Design, Automation & Test in Europe.

[9] David H. Albonesi,et al. Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[10] Zhao Zhang,et al. FlexiWay: A cache energy saving technique using fine-grained cache reconfiguration , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[11] Nam Sung Kim,et al. Low-voltage on-chip cache architecture using heterogeneous cell sizes for high-performance processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[12] Hsien-Hsin S. Lee,et al. Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches , 2009, ISLPED.

[13] S. Tam,et al. A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[14] Björn Franke,et al. Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[15] Babak Falsafi,et al. Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[16] Jihong Kim,et al. Replication-aware leakage management in chip multiprocessors with private L2 caches , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[17] Pierfrancesco Foglia,et al. A workload independent energy reduction strategy for D-NUCA caches , 2013, The Journal of Supercomputing.

[18] T. Mudge,et al. Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[19] Jiang Hu,et al. Power gating with block migration in chip-multiprocessor last-level caches , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[20] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[21] Yue Wang,et al. Run-time power-gating in caches of GPUs for leakage energy savings , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[22] Alessandro Bardine,et al. Evaluation of Leakage Reduction Alternatives for Deep Submicron Dynamic Nonuniform Cache Architecture Caches , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23] Stefan Rusu,et al. A 45nm 8-core enterprise Xeon ® processor , 2009 .

[24] Meng-Fan Chang,et al. Leveraging data lifetime for energy-aware last level non-volatile SRAM caches using redundant store elimination , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[25] Michael M. Swift,et al. FreshCache: Statically and dynamically exploiting dataless ways , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[26] Jason Cong,et al. Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27] Shuai Wang,et al. Thermal-Aware Subarrayed Data Cache Microarchitectures , 2009 .

[28] Jonathan Chang,et al. A 45 nm 8-Core Enterprise Xeon¯ Processor , 2010, IEEE J. Solid State Circuits.

[29] Wei Wu,et al. Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[30] Sunggu Lee,et al. A Multistep Tag Comparison Method for a Low-Power L2 Cache , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[31] Mahmut T. Kandemir,et al. Design space exploration of workload-specific last-level caches , 2012, ISLPED '12.

[32] Alessandro Bardine,et al. Way adaptable D-NUCA caches , 2010, Int. J. High Perform. Syst. Archit..

[33] Ken Smits,et al. Penryn: 45-nm next generation Intel® core™ 2 processor , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[34] Vikram Bhatt,et al. GreenDroid: An architecture for the Dark Silicon Age , 2012, 17th Asia and South Pacific Design Automation Conference.

[35] Christopher Mozak,et al. Westmere: A family of 32nm IA processors , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[36] Per Stenström,et al. Leveraging Data Promotion for Low Power D-NUCA Caches , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[37] Norman P. Jouppi,et al. Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.

[38] Mahmut T. Kandemir,et al. EECache: Exploiting design choices in energy-efficient last-level caches for chip multiprocessors , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[39] Michael Bedford Taylor,et al. Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[40] Steven Swanson,et al. Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[41] Kaushik Roy,et al. Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[42] Yiannakis Sazeides,et al. Eliminating energy of same-content-cell-columns of on-chip SRAM arrays , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[43] Changkyu Kim,et al. Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches , 2003, IEEE Micro.

[44] Margaret Martonosi,et al. Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[45] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[46] Babak Falsafi,et al. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[47] Onur Mutlu,et al. A Case for MLP-Aware Cache Replacement , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[48] Amin Jadidi,et al. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.