Predictive Guardbanding: Program-Driven Timing Margin Reduction for GPUs

The energy efficiency of GPU architectures has emerged as an essential aspect of computer system design. In this article, we explore the energy benefits of reducing the GPU chip’s voltage to the safe limit, i.e., <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> point, using predictive software techniques. We perform such a study on several commercial off-the-shelf GPU cards. We find that there exists about 20% voltage guardband on those GPUs spanning two architectural generations, which, if “eliminated” entirely, can result in up to 25% energy savings on one of the studied GPU cards. Our measurement results unveil a program dependent <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> behavior across the studied applications, and the exact improvement magnitude depends on the program’s available guardband. We make fundamental observations about the program-dependent <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> behavior. We experimentally determine that the voltage noise has a more substantial impact on <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> compared to the process and temperature variation, and the activities during the kernel execution cause large voltage droops. From these findings, we show how to use kernels’ microarchitectural performance counters to predict its <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> value accurately. The average and maximum prediction errors are 0.5% and 3%, respectively. The accurate <inline-formula> <tex-math notation="LaTeX">$V_{\min }$ </tex-math></inline-formula> prediction opens up new possibilities of a cross-layer dynamic guardbanding scheme for GPUs, in which software predicts and manages the voltage guardband, while the functional correctness is ensured by a hardware safety net mechanism.

[1]  Gary D. Carpenter,et al.  Single-cycle, pulse-shaped critical path monitor in the POWER7+ microprocessor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[2]  Bishop Brock,et al.  Active management of timing guardband to save energy in POWER7 , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  T. N. Vijaykumar,et al.  Pipeline damping: a microarchitectural technique to reduce inductive noise in supply voltage , 2003, ISCA '03.

[4]  Radu Teodorescu,et al.  Dynamic reduction of voltage margins by leveraging on-chip ECC in Itanium II processors , 2013, ISCA.

[5]  Margaret Martonosi,et al.  Control techniques to eliminate voltage emergencies in high performance processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[6]  William V. Huott,et al.  On-chip Timing Uncertainty Measurements on IBM Microprocessors , 2008, 2008 IEEE International Test Conference.

[7]  Gu-Yeon Wei,et al.  Ivory: Early-stage design space exploration tool for integrated voltage regulators , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Jingwen Leng,et al.  Adaptive guardband scheduling to improve system-level efficiency of the POWER7+ , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Xiang Pan,et al.  VRSync: Characterizing and eliminating synchronization-induced voltage emergencies in many-core processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[10]  Radu Teodorescu,et al.  Using ECC Feedback to Guide Voltage Speculation in Low-Voltage Processors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[11]  Jingwen Leng,et al.  GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[12]  P.J. Restle,et al.  Timing uncertainty measurements on the Power5 microprocessor , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[13]  Radu Teodorescu,et al.  EmerGPU: Understanding and mitigating resonance-induced voltage noise in GPU architectures , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[15]  Meeta Sharma Gupta,et al.  GPUVolt: Modeling and characterizing voltage noise in GPU architectures , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[16]  Behzad Salami,et al.  Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins , 2020, IEEE Transactions on Device and Materials Reliability.

[17]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Pradip Bose,et al.  Voltage Noise in Multi-Core Processors: Empirical Characterization and Optimization Opportunities , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[20]  Chenming Hu,et al.  Characterization of spatial intrafield gate CD variability, its impact on circuit performance, and spatial mask-level correction , 2004, IEEE Transactions on Semiconductor Manufacturing.

[21]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Meeta Sharma Gupta,et al.  Understanding Voltage Variations in Chip Multiprocessors using a Distributed Power-Delivery Network , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[23]  Xin He,et al.  Voltage-Stacked GPUs: A Control Theory Driven Cross-Layer Solution for Practical Voltage Stacking in GPUs , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[25]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[26]  Vivek Tiwari,et al.  Microarchitectural simulation and control of di/dt-induced power supply voltage variation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[27]  Christopher Gill,et al.  Voltage-Stacked Power Delivery Systems: Reliability, Efficiency, and Power Management , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[28]  Yu Cao,et al.  Modeling of intra-die process variations for accurate analysis and optimization of nano-scale circuits , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[29]  Meeta Sharma Gupta,et al.  Voltage emergency prediction: Using signatures to reduce operating margins , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[30]  Soraya Ghiasi,et al.  A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[31]  Michael D. Smith,et al.  Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  A. Asenov,et al.  Where Do the Dopants Go? , 2005, Science.

[34]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35]  Lizy Kurian John,et al.  AUDIT: Stress Testing the Automatic Way , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[36]  Keith A. Jenkins,et al.  Long-term NBTI degradation under real-use conditions in IBM microprocessors , 2014, Microelectron. Reliab..

[37]  Bin Li,et al.  Computer comparisons in the presence of performance variation , 2018, Frontiers of Computer Science.

[38]  Cristian Constantinescu,et al.  Silent Data Corruption - Myth or reality? , 2008, DSN.

[39]  Hidetoshi Onodera,et al.  Statistical Parameter Extraction for Intra- and Inter-Chip Variabilities of Metal-Oxide-Semiconductor Field-Effect Transistor Characteristics , 2005 .

[40]  Li Zhou,et al.  Core tunneling: Variation-aware voltage noise mitigation in GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[41]  Minyi Guo,et al.  Asymmetric Resilience: Exploiting Task-Level Idempotency for Transient Error Recovery in Accelerator-Based Systems , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[42]  Pradip Bose,et al.  Safe limits on voltage reduction efficiency in GPUs: A direct measurement approach , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[43]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[44]  Osman S. Unsal,et al.  Modern Hardware Margins: CPUs, GPUs, FPGAs Recent System-Level Studies , 2019, 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS).