Temperature-aware runtime power management for chip-multiprocessors with 3-D stacked cache

The advent of 3-D fabrication technology makes it possible to stack a large amount of last-level cache memory onto a multi-core die to reduce off-chip memory accesses and, thus, increases system performance. However, the higher power density (i.e., power dissipation per unit volume) of 3-D integrated circuits (ICs) might incur temperature-related problems in reliability, leakage power, system performance, and cooling cost. In this paper, we propose a runtime solution to maximize the performance (i.e., instruction throughput) of chip-multiprocessors with 3-D stacked last-level cache memory, without thermal-constraint violation. The proposed method combines runtime cache tuning (e.g., cache-way partitioning, cache-way power-gating, cache data placement) with per-core dynamic voltage/frequency scaling (DVFS) in a temperature-aware manner. Experimental results show that the integrated method offers 23% performance improvement on average in terms of instructions per second (IPS) compared with temperature-aware runtime cache tuning only.

[1]  Chong-Min Kyung,et al.  Runtime Power Management of 3-D Multi-Core Architectures Under Peak Power and Temperature Constraints , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Shyamkumar Thoziyoor,et al.  1 CACTI 4 . 0 , 2006 .

[4]  Weixun Wang,et al.  Leakage-Aware Energy Minimization Using Dynamic Voltage Scaling and Cache Reconfiguration in Real-Time Systems , 2010, 2010 23rd International Conference on VLSI Design.

[5]  Chong-Min Kyung,et al.  Program Phase-Aware Dynamic Voltage Scaling Under Variable Computational Workload and Memory Stall Environment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Yen-Kuang Chen,et al.  The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[7]  George F. Corliss Which Root Does the Bisection Algorithm Find , 1977 .

[8]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Yusuf Leblebici,et al.  Dynamic thermal management in 3D multicore architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[10]  Vijayalakshmi Srinivasan,et al.  Cache miss behavior: is it √2? , 2006, CF '06.

[11]  Marcelo Yuffe,et al.  The Implementation of the 65nm Dual-Core 64b Merom Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[12]  Yuan Xie,et al.  An energy-efficient 3D CMP design with fine-grained voltage scaling , 2011, 2011 Design, Automation & Test in Europe.

[13]  Jun Yang,et al.  Thermal-Aware Task Scheduling for 3D Multicore Processors , 2010, IEEE Transactions on Parallel and Distributed Systems.

[14]  Tohru Ishihara,et al.  A non-uniform cache architecture for low power system design , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[15]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[16]  André C. Nácul,et al.  Dynamic voltage and cache reconfiguration for low power , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[17]  Li Shang,et al.  Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Huazhong Yang,et al.  Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs , 2012, TODE.

[19]  Harvey J. Everett Generalized Lagrange Multiplier Method for Solving Problems of Optimum Allocation of Resources , 1963 .

[20]  Kevin Skadron,et al.  Predictive Temperature-Aware DVFS , 2010, IEEE Transactions on Computers.

[21]  Avesta Sasan,et al.  Multiple sleep mode leakage control for cache peripheral circuits in embedded processors , 2008, CASES '08.

[22]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Chong-Min Kyung,et al.  Maximizing throughput of temperature-constrained multi-core systems with 3D-stacked cache memory , 2011, 2011 12th International Symposium on Quality Electronic Design.

[24]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.