Understanding GPU Power

Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high-throughput applications. Although GPUs consume large amounts of power, their use for high-throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. This work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. As direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to power use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling are discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Last, possible directions for future research are discussed.

[1]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Jack Dongarra,et al.  Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems , 2014 .

[4]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[5]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[6]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[7]  Sudhakar Yalamanchili,et al.  Energy Introspector : Simulation Infrastructure for Power , Temperature , and Reliability Modeling in Manycore Processors , 2011 .

[8]  Bin Li,et al.  Statistical GPU power analysis using tree-based methods , 2011, 2011 International Green Computing Conference and Workshops.

[9]  James H. Laros,et al.  PowerInsight - A commodity power measurement capability , 2013, 2013 International Green Computing Conference Proceedings.

[10]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[11]  Richard W. Vuduc,et al.  A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.

[12]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[13]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[14]  A. Church An Unsolvable Problem of Elementary Number Theory , 1936 .

[15]  Rong Ge,et al.  Power and energy profiling of scientific applications on distributed systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[16]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[17]  Zhongliang Chen,et al.  NUPAR: A Benchmark Suite for Modern GPU Architectures , 2015, ICPE.

[18]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[19]  Shirley Moore,et al.  PAPI 5: Measuring power, energy, and the cloud , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[21]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[22]  Lu Peng,et al.  Weak execution ordering - exploiting iterative methods on many-core GPUs , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[23]  Stephen L. Olivier,et al.  High Performance Computing - Power Application Programming Interface Specification. , 2016 .

[24]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[25]  James H. Laros,et al.  Qualification for PowerInsight accuracy of power measurements. , 2013 .

[26]  Stephen L. Olivier,et al.  Power API for HPC: Standardizing Power Measurement and Control. , 2015 .

[27]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[28]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[29]  Jeffrey S. Vetter,et al.  A Survey of Methods for Analyzing and Improving GPU Energy Efficiency , 2014, ACM Comput. Surv..

[30]  G. D. Peterson,et al.  Power Aware Computing on GPUs , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[31]  Qi Zhao,et al.  POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program , 2013, ICA3PP.

[32]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[33]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[34]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[35]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[36]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[37]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[38]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[39]  S.A. Manavski,et al.  CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[40]  S Jarp,et al.  Perfmon2: a leap forward in performance monitoring , 2008 .

[41]  Kevin Skadron,et al.  Fine-grained graphics architectural simulation with Qsilver , 2005, SIGGRAPH '05.

[42]  David Parello,et al.  Barra, a Modular Functional GPU Simulator for GPGPU , 2009 .

[43]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[44]  David Defour,et al.  Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[45]  Daniel Bedard,et al.  PowerMon: Fine-grained and integrated power monitoring for commodity computer systems , 2010, Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon).

[46]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[47]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[48]  Brinkley Sprunt,et al.  The Basics of Performance-Monitoring Hardware , 2002, IEEE Micro.

[49]  K. Ramani,et al.  PowerRed : A Flexible Modeling Framework for Power Efficiency Exploration in GPUs , .

[50]  Kevin Skadron,et al.  A flexible simulation framework for graphics architectures , 2004, Graphics Hardware.

[51]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[52]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[53]  Yue Wang,et al.  An Instruction-Level Energy Estimation and Optimization Methodology for GPU , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[54]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[55]  Lizy Kurian John,et al.  Run-time modeling and estimation of operating system power consumption , 2003, SIGMETRICS '03.

[56]  Carlos González,et al.  ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[57]  Kevin Skadron,et al.  Studying Thermal Management for Graphics-Processor Architectures , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[58]  Martin Burtscher,et al.  Measuring GPU Power with the K20 Built-in Sensor , 2014, GPGPU@ASPLOS.

[59]  Kirk W. Cameron,et al.  The Optimist, the Pessimist, and the Global Race to Exascale in 20 Megawatts , 2012, Computer.

[60]  Greg Humphreys,et al.  How GPUs Work , 2007, Computer.

[61]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[62]  Ganesh Chandra Deka,et al.  History and Evolution of GPU Architecture , 2016 .

[63]  David W. Nellans,et al.  Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[64]  Keshav Pingali,et al.  Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  Venkatram Vishwanath,et al.  GROPHECY: GPU performance projection from CPU code skeletons , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[67]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[68]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[69]  Lynn A. Nystrom University partners with Apple and Mellanox for energy efficient 22.8 TFlop supercomputer , 2008 .

[70]  Andrew T. Fenley,et al.  An analytical approach to computing biomolecular electrostatic potential. II. Validation and applications. , 2008, The Journal of chemical physics.

[71]  Girish Bekaroo,et al.  Power Measurement of Computers: Analysis of the Effectiveness of the Software Based Approach , 2014 .

[72]  Margaret Martonosi,et al.  GPU Performance and Power Tuning Using Regression Trees , 2015, TACO.

[73]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[74]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[75]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[76]  Sudhakar Yalamanchili,et al.  Power Modeling for GPU Architectures Using McPAT , 2014, TODE.

[77]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[78]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .