PGCapping: Exploiting power gating for power capping and core lifetime balancing in CMPs

Optimizing the performance of a chip multiprocessor (CMP) within a power cap has recently received a lot of attention. However, most existing solutions rely solely on DVFS, which is anticipated to have only limited actuation ranges in the future. Power gating shuts down idling cores in a CMP, such that more power can be shifted to the cores that run applications for better CMP performance. However, current preliminary studies on integrating the two knobs focus on deciding the power gating and DVFS levels in a tightly coupled fashion, with much less attention given to the direction of decoupled designs. By decoupling the two knobs that may interfere with each other, individual knob management algorithms can be less complex and more efficient to take advantage of the characteristics of different knobs. This paper proposes PGCapping, a decoupled design to integrate power gating with DVFS for CMP power capping. To fully utilize the power headroom that is reserved through power gating, PGCapping enables per-core overclocking on turned-on cores that run sequential applications. However, per-core overclocking may make some cores age much faster than others and thus become the reliability bottleneck in the whole system. Therefore, PGCapping also uses power gating to balance the core lifetimes. Our empirical results on a hardware testbed show that the proposed scheme achieves up to 42.0% better average application performance than five state-of-the-art power capping baselines for realistic multi-core applications, i.e., a mixed group of PARSEC and SPEC CPU2006 benchmarks. Furthermore, our extensive simulation results with real-world traces demonstrate that a lightweight lifetime balancing algorithm (based on power gating) can increase the CMP lifetime by 9.2% on average.

[1]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[2]  Josep Torrellas,et al.  Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[3]  Luca P. Carloni,et al.  A 2.5D integrated voltage regulator using coupled-magnetic-core inductors on silicon interposer delivering 10.8A/mm2 , 2012, ISSCC.

[4]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[5]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[6]  Engin Ipek,et al.  Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[7]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[8]  K. Bernstein,et al.  Scaling, power, and the future of CMOS , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[9]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[10]  Pradip Bose,et al.  Dynamic power gating with quality guarantees , 2009, ISLPED.

[11]  Pradip Bose,et al.  A case for guarded power gating for multi-core processors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[12]  Lizy Kurian John,et al.  Predictive power management for multi-core processors , 2010, ISCA'10.

[13]  Ioannis Kymissis,et al.  A 2.5D Integrated Voltage Regulator Using Coupled-Magnetic-Core Inductors on Silicon Interposer , 2012, IEEE Journal of Solid-State Circuits.

[14]  Gene F. Franklin,et al.  Digital Control Of Dynamic Systems 3rd Edition , 2014 .

[15]  Li Shang,et al.  Multi-Optimization power management for chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Tajana Simunic,et al.  Evaluating the impact of job scheduling and power management on processor lifetime for chip multiprocessors , 2009, SIGMETRICS '09.

[17]  Kai Ma,et al.  Scalable power control for many-core architectures running multi-threaded applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[18]  Xiaorui Wang,et al.  SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[19]  Shuguang Feng,et al.  Self-calibrating Online Wearout Detection , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[20]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[21]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[22]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[23]  Xiaorui Wang,et al.  Server-Level Power Control , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[24]  Chao Li,et al.  SolarCore: Solar energy driven multi-core architecture power management , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[25]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[26]  Xiaorui Wang,et al.  SHIP: A Scalable Hierarchical Power Control Architecture for Large-Scale Data Centers , 2012, IEEE Transactions on Parallel and Distributed Systems.

[27]  Josep Torrellas,et al.  The BubbleWrap many-core: Popping cores for sequential acceleration , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Vanish Talwar,et al.  Power Management of Datacenter Workloads Using Per-Core Power Gating , 2009, IEEE Computer Architecture Letters.

[29]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[30]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[31]  References , 1971 .

[32]  Nam Sung Kim,et al.  Optimizing throughput of power- and thermal-constrained multicore processors using DVFS and per-core power-gating , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[33]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[34]  Christine A. Shoemaker,et al.  Scalable thread scheduling and global power management for heterogeneous many-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[35]  Gene F. Franklin,et al.  Digital control of dynamic systems , 1980 .

[36]  David Blaauw,et al.  Process variation and temperature-aware reliability management , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[37]  Xiaorui Wang,et al.  How much power oversubscription is safe and allowed in data centers , 2011, ICAC '11.

[38]  Andy Oram,et al.  Understanding the Linux Kernel, Second Edition , 2002 .

[39]  Tong Li,et al.  Spin detection hardware for improved management of multithreaded systems , 2006, IEEE Transactions on Parallel and Distributed Systems.

[40]  Xue Li,et al.  Coordinating processor and main memory for efficientserver power control , 2011, ICS '11.

[41]  Marco Cesati,et al.  Understanding the Linux Kernel, Third Edition , 2005 .

[42]  Qiang Xu,et al.  Lifetime reliability-aware task allocation and scheduling for MPSoC platforms , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[43]  Xiaorui Wang,et al.  Cluster-level feedback power control for performance optimization , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.