Resource Sharing Centric Dynamic Voltage and Frequency Scaling for CMP Cores, Uncore, and Memory

With the breakdown of Dennard’s scaling over the past decade, performance growth of modern microprocessor design has largely relied on scaling core count in chip multiprocessors (CMPs). The challenge of chip power density, however, remains and demands new power management solutions. This work investigates a coordinated CMP systemwide Dynamic Voltage and Frequency Scaling (DVFS) policy centered around shared resource utilization. This approach represents a new angle on the problem, differing from the conventional core-workload-driven approaches. The key component of our work is per-core DVFS leveraging a technique similar to TCP Vegas congestion control from networking. This TCP Vegas–based DVFS can potentially identify the synergy between power reduction and performance improvement. Further, this work includes uncore (on-chip interconnect and shared last level cache) and main memory DVFS policies coordinated with the per-core DVFS policy. Full system simulations on PARSEC benchmarks show that our technique reduces total energy dissipation by over 47% across all benchmarks with less than 2.3% performance degradation. Our work also leads to 12% more energy savings compared to a prior work CMP DVFS policy.

[1]  Rajesh Kumar,et al.  A family of 45nm IA processors , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[2]  Stijn Eyerman,et al.  Fine-grained DVFS using on-chip regulators , 2011, TACO.

[3]  Diana Marculescu,et al.  Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors , 2012, ISLPED '12.

[4]  Bishop Brock,et al.  Active Guardband Management in Power7+ to Save Energy and Maintain Reliability , 2013, IEEE Micro.

[5]  Steven H. Low,et al.  Understanding TCP Vegas: a duality model , 2002 .

[6]  James E. Smith,et al.  Virtual private caches , 2007, ISCA '07.

[7]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[8]  Stefanos Kaxiras,et al.  Green governors: A framework for Continuously Adaptive DVFS , 2011, 2011 International Green Computing Conference and Workshops.

[9]  Pierre Michaud,et al.  Demystifying multicore throughput metrics , 2013, IEEE Computer Architecture Letters.

[10]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Kevin Kai-Wei Chang,et al.  HAT: Heterogeneous Adaptive Throttling for On-Chip Networks , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[13]  Massoud Pedram,et al.  Supervised Learning Based Power Management for Multicore Processors , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Axel Jantsch,et al.  Adaptive Power Management for the On-Chip Communication Network , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[15]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[16]  Margaret Martonosi,et al.  Coordinated, distributed, formal energy management of chip multiprocessors , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[17]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[18]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[19]  Tajana Simunic,et al.  System-Level Power Management Using Online Learning , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[21]  Engin Ipek,et al.  Dynamic Multicore Resource Management: A Machine Learning Approach , 2009, IEEE Micro.

[22]  Onur Mutlu,et al.  Preemptive Virtual Clock: A flexible, efficient, and cost-effective QOS scheme for networks-on-chip , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Michael L. Scott,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, MICRO.

[24]  Jiang Hu,et al.  Having your cake and eating it too: Energy savings without performance loss through resource sharing driven power management , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[25]  Andrew B. Kahng,et al.  ORION 2.0: A Power-Area Simulator for Interconnection Networks , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[26]  Lizy Kurian John,et al.  Efficient traffic aware power management for multicore communications processors , 2012, 2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[27]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[28]  Krste Asanovic,et al.  Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks , 2008, 2008 International Symposium on Computer Architecture.

[29]  Li-Shiuan Peh,et al.  CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs , 2011, J. Parallel Distributed Comput..

[30]  David Blaauw,et al.  Theoretical and practical limits of dynamic voltage scaling , 2004, Proceedings. 41st Design Automation Conference, 2004..

[31]  Larry L. Peterson,et al.  TCP Vegas: End to End Congestion Avoidance on a Global Internet , 1995, IEEE J. Sel. Areas Commun..

[32]  R. Srikant,et al.  Network Optimization and Control , 2008, Found. Trends Netw..

[33]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[34]  Xi Chen,et al.  Up by their bootstraps: Online learning in Artificial Neural Networks for CMP uncore power management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[35]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[36]  Margaret Martonosi,et al.  Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors , 2009, ISCA '09.

[37]  Xi Chen,et al.  In-network Monitoring and Control Policy for DVFS of CMP Networks-on-Chip and Last Level Caches , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[38]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[39]  S. Low,et al.  Understanding Vegas: a duality model , 2002 .

[40]  John Kim,et al.  Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[41]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[42]  Ying Tan,et al.  Achieving autonomous power management using reinforcement learning , 2013, TODE.

[43]  Onur Mutlu,et al.  Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS 2010.

[44]  Xi Chen,et al.  Dynamic voltage and frequency scaling for shared resources in multicore processor designs , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[45]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[46]  Radu Marculescu,et al.  Variation-adaptive feedback control for networks-on-chip with multiple clock domains , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[47]  Thomas F. Wenisch,et al.  CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[48]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.