Dynamic management of TurboMode in modern multi-core chips

Dynamic overclocking of CPUs, or TurboMode, is a feature recently introduced on all x86 multi-core chips. It leverages thermal and power headroom from idle execution resources to overclock active cores to increase performance. TurboMode can accelerate CPU-bound applications at the cost of additional power consumption. Nevertheless, naive use of TurboMode can significantly increase power consumption without increasing performance. Thus far, there is no strategy for managing TurboMode to optimize its use across all workloads and efficiency metrics. This paper analyzes the impact of TurboMode on a wide range of efficiency metrics (performance, power, cost, and combined metrics such as QPS/W and ED2) for representative server workloads on various hardware configurations. We determine that TurboMode is generally beneficial for performance (up to +24%), cost efficiency (QPS/$ up to +8%), energy-delay product (ED, up to +47%), and energy-delay-squared product (ED2, up to +68%). However, TurboMode is inefficient for workloads that exhibit interference for shared resources. We use this information to build and validate a model that predicts the optimal TurboMode setting for each efficiency metric. We then implement autoturbo, a background daemon that dynamically manages TurboMode in real time without any hardware changes. We demonstrate that autoturbo improves QPS/$, ED, and ED2 by 8%, 47%, and 68% respectively over not using TurboMode. At the same time, autoturbo virtually eliminates all the large drops in those same metrics (-12%, -25%, -25% for QPS/$, ED, and ED2) that occur when TurboMode is used naively (always on).

[1]  Vanish Talwar,et al.  Power Management of Datacenter Workloads Using Per-Core Power Gating , 2009, IEEE Computer Architecture Letters.

[2]  Adam Rifkin,et al.  Nutch: A Flexible and Scalable Open-Source Web Search Engine , 2005 .

[3]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[4]  Hewlett Packard,et al.  Enterprise Power and Cooling: a Chip-to-data Center Perspective , .

[5]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[6]  Lingjia Tang,et al.  Directly characterizing cross core interference through contention synthesis , 2011, HiPEAC.

[7]  Kushagra Vaid,et al.  Web search using mobile cores: quantifying and mitigating the price of efficiency , 2010, ISCA.

[8]  Yinglian Xie,et al.  Locality in search engine queries and its implications for caching , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[9]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[10]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[11]  Christoforos E. Kozyrakis,et al.  Scalable and Efficient Fine-Grained Cache Partitioning with Vantage , 2012, IEEE Micro.

[12]  John Wilkes,et al.  Keynote , 2019, FCRC '15.

[13]  James Charles,et al.  Evaluation of the Intel® Core™ i7 Turbo Boost feature , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Ramesh Illikkal,et al.  Rate-based QoS techniques for cache/memory in CMP platforms , 2009, ICS.

[15]  Margaret Martonosi,et al.  Long-term workload phases: duration predictions and applications to DVFS , 2005, IEEE Micro.

[16]  Stijn Eyerman,et al.  A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.

[17]  Ivan Bratko,et al.  Microarray data mining with visual programming , 2005, Bioinform..

[18]  Frank Bellosa,et al.  Resource-conscious scheduling for energy efficiency on multicore processors , 2010, EuroSys '10.

[19]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[20]  Marios C. Papaefthymiou,et al.  Computational sprinting , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[21]  Magnus Jahre,et al.  A Quantitative Study of Memory System Interference in Chip Multiprocessor Architectures , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[22]  Aamer Jaleel,et al.  Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[23]  Randy H. Katz,et al.  An energy case for hybrid datacenters , 2010, OPSR.

[24]  Alon Naveh,et al.  Power management architecture of the 2nd generation Intel® Core microarchitecture, formerly codenamed Sandy Bridge , 2011, IEEE Hot Chips Symposium.

[25]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[26]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[27]  Amip J. Shah,et al.  Cost Model for Planning, Development and Operation of a Data Center , 2005 .

[28]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.