Fine-Tuning the Active Timing Margin (ATM) Control Loop for Maximizing Multi-core Efficiency on an IBM POWER Server

Active Timing Margin (ATM) is a technology that improves processor efficiency by reducing the pipeline timing margin with a control loop that adjusts voltage and frequency based on real-time chip environment monitoring. Although ATM has already been shown to yield substantial performance benefits, its full potential has yet to be unlocked. In this paper, we investigate how to maximize ATM’s efficiency gain with a new means of exposing the inter-core speed variation: finetuning the ATM control loop. We conduct our analysis and evaluation on a production-grade POWER7+ system. On the POWER7+ server platform, we fine-tune the ATM control loop by programming its Critical Path Monitors, a key component of its ATM design that measures the cores’ timing margins. With a robust stress-test procedure, we expose over 200 MHz of inherent inter-core speed differential by fine-tuning the percore ATM control loop. Exploiting this differential, we manage to double the ATM frequency gain over the static timing margin; this is not possible using conventional means, i.e. by setting fixed <v, f> points for each core, because the corelevel <v, f> must account for chip-wide worst-case voltage variation. To manage the significant performance heterogeneity of fine-tuned systems, we propose application scheduling and throttling to manage the chip’s process and voltage variation. Our proposal improves application performance by more than 10% over the static margin, almost doubling the 6% improvement of the default, unmanaged ATM system. Our technique is general enough that it can be adopted by any system that employs an active timing margin control loop. Keywords-Active timing margin, Performance, Power efficiency, Reliability, Critical path monitors

[1]  Meeta Sharma Gupta,et al.  Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[2]  Meeta Sharma Gupta,et al.  An event-guided approach to reducing voltage noise in processors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Xiang Pan,et al.  VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[4]  Li Zhou,et al.  Core tunneling: Variation-aware voltage noise mitigation in GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[5]  José Luis Neves,et al.  IBM z14 design methodology enhancements in the 14-nm technology node , 2018, IBM J. Res. Dev..

[6]  Jingwen Leng,et al.  GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[7]  Timothy J. Slegel,et al.  Robust power management in the IBM z13 , 2015, IBM J. Res. Dev..

[8]  Pradip Bose,et al.  Safe limits on voltage reduction efficiency in GPUs: A direct measurement approach , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Keith A. Bowman,et al.  A 22nm dynamically adaptive clock distribution for voltage droop tolerance , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[10]  Keith A. Bowman,et al.  8.5 A 16nm auto-calibrating dynamically adaptive clock distribution for maximizing supply-voltage-droop tolerance across a wide operating range , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[11]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[12]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[13]  Meeta Sharma Gupta,et al.  Voltage emergency prediction: Using signatures to reduce operating margins , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[14]  Lizy Kurian John,et al.  AUDIT: Stress Testing the Automatic Way , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Saurabh Dighe,et al.  Adaptive Frequency and Biasing Techniques for Tolerance to Dynamic Temperature-Voltage Variations and Aging , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[16]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[17]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18]  Krishna K. Rangan,et al.  Achieving uniform performance and maximizing throughput in the presence of heterogeneity , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[19]  Reena Panda,et al.  Wait of a Decade: Did SPEC CPU 2017 Broaden the Performance Horizon? , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Soraya Ghiasi,et al.  A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[21]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Indrani Paul,et al.  Ti-states: Processor power management in the temperature inversion region , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Vivek Tiwari,et al.  Microarchitectural simulation and control of di/dt-induced power supply voltage variation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[24]  Samuel Naffziger,et al.  5.6 Adaptive clocking system for improved power efficiency in a 28nm x86-64 microprocessor , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[25]  Jaydeep P. Kulkarni,et al.  5.7 A graphics execution core in 22nm CMOS featuring adaptive clocking, selective boosting and state-retentive sleep , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[26]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[27]  Shidhartha Das,et al.  Harnessing Voltage Margins for Energy Efficiency in Multicore CPUs , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  S. Naffziger,et al.  A 90nm variable-frequency clock system for a power-managed Itanium/sup /spl reg//-family processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[29]  Malcolm Allen-Ware,et al.  26.2 Power supply noise in a 22nm z13™ microprocessor , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[30]  Alex E. Mericas Performance characteristics of the POWER8 processor , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[31]  Gu-Yeon Wei,et al.  Thread motion: fine-grained power management for multi-core systems , 2009, ISCA '09.

[32]  Meeta Sharma Gupta,et al.  Eliminating voltage emergencies via software-guided code transformations , 2010, TACO.

[33]  Meeta Sharma Gupta,et al.  Systematic Energy Characterization of CMP/SMT Processor Systems via Automated Micro-Benchmarks , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[34]  Alper Buyuktosunoglu,et al.  Droop mitigation using critical-path sensors and an on-chip distributed power supply estimation engine in the z14™ enterprise processor , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[35]  Meeta Sharma Gupta,et al.  DeCoR: A Delayed Commit and Rollback mechanism for handling inductive noise in processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[36]  Jingwen Leng,et al.  Adaptive guardband scheduling to improve system-level efficiency of the POWER7+ , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[37]  Daniel Friedman,et al.  A DPLL-based per core variable frequency clock generator for an eight-core POWER7™ microprocessor , 2010, 2010 Symposium on VLSI Circuits.

[38]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[39]  Pradip Bose,et al.  Voltage Noise in Multi-Core Processors: Empirical Characterization and Optimization Opportunities , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[40]  Radu Teodorescu,et al.  Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[41]  M.D. Powell,et al.  Pipeline damping: a microarchitectural technique to reduce inductive noise in supply voltage , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[42]  Saurabh Dighe,et al.  Within-die variation-aware dynamic-voltage-frequency scaling core mapping and thread hopping for an 80-core processor , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[43]  Bishop Brock,et al.  Active Guardband Management in Power7+ to Save Energy and Maintain Reliability , 2013, IEEE Micro.

[44]  Gary D. Carpenter,et al.  Single-cycle, pulse-shaped critical path monitor in the POWER7+ microprocessor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[45]  Dimitris Gizopoulos,et al.  Micro-Viruses for Fast System-Level Voltage Margins Characterization in Multicore CPUs , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[46]  Bishop Brock,et al.  Active management of timing guardband to save energy in POWER7 , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[47]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.