Verified instruction-level energy consumption measurement for NVIDIA GPUs

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy in these systems. However, vendors do not publish the actual cost of the power/energy overhead of their internal microarchitecture. In this paper, we accurately measure the energy consumption of various PTX instructions found in modern NVIDIA GPUs. We provide an exhaustive comparison of more than 40 instructions for four high-end NVIDIA GPUs from four different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, we show the effect of the CUDA compiler optimizations on the energy consumption of each instruction. We use three different software techniques to read the GPU on-chip power sensors, which use NVIDIA's NVML API and provide an in-depth comparison between these techniques. Additionally, we verified the software measurement techniques against a custom-designed hardware power measurement. The results show that Volta GPUs have the best energy efficiency of all the other generations for the different categories of the instructions. This work should aid in understanding NVIDIA GPUs' microarchitecture. It should also make energy measurements of any GPU kernel both efficient and accurate.

[1]  Mahmut T. Kandemir,et al.  μC-States: Fine-grained GPU datapath power management , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[2]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[3]  Richard W. Vuduc,et al.  A Roofline Model of Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[4]  Ben H. H. Juurlink,et al.  How a single chip causes massive power bills GPUSimPow: A GPGPU power simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[5]  Qi Zhao,et al.  POIGEM: A Programming-Oriented Instruction Level GPU Energy Model for CUDA Program , 2013, ICA3PP.

[6]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[7]  Gopinath Chennupati,et al.  PPT-GPU: Scalable GPU Performance Modeling , 2019, IEEE Computer Architecture Letters.

[8]  Jack J. Dongarra,et al.  Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Gopinath Chennupati,et al.  Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines , 2019, SIGSIM-PADS.

[10]  Gopinath Chennupati,et al.  PPT-GPU: performance prediction toolkit for GPUs identifying the impact of caches: extended abstract , 2018, MEMSYS.

[11]  Daniel Bedard,et al.  PowerMon: Fine-grained and integrated power monitoring for commodity computer systems , 2010, Proceedings of the IEEE SoutheastCon 2010 (SoutheastCon).

[12]  Anuj Pathania,et al.  Integrated CPU-GPU power management for 3D mobile games , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  John W. Romein,et al.  PowerSensor 2: A Fast Power Measurement Tool , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Martin Burtscher,et al.  Measuring GPU Power with the K20 Built-in Sensor , 2014, GPGPU@ASPLOS.

[15]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[16]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[17]  Gopinath Chennupati,et al.  Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[18]  Jack J. Dongarra,et al.  Beyond the CPU: Hardware Performance Counter Monitoring on Blue Gene/Q , 2013, ISC.

[19]  Gopinath Chennupati,et al.  GPUs Cache Performance Estimation using Reuse Distance Analysis , 2019, 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC).

[20]  Gopinath Chennupati,et al.  Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles , 2020, ICS.

[21]  Allen D. Malony,et al.  Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.

[22]  Mariza Ferro,et al.  Analysis of GPU Power Consumption Using Internal Sensors , 2017 .

[23]  Niall Emmart,et al.  A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units , 2018 .

[24]  Luigi Carro,et al.  GPGPUs: How to combine high computational power with high reliability , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Carole-Jean Wu,et al.  Understanding the Future of Energy Efficiency in Multi-Module GPUs , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[26]  Neena Imam,et al.  Understanding GPU Power , 2016, ACM Comput. Surv..

[27]  Thomas W. Tucker,et al.  The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  James H. Laros,et al.  PowerInsight - A commodity power measurement capability , 2013, 2013 International Green Computing Conference Proceedings.

[29]  Neena Imam,et al.  Quality Assessment of GPU Power Profiling Mechanisms , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[30]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[31]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.

[32]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[33]  Mingyu Chen,et al.  Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning , 2017, PPoPP.

[34]  G. D. Peterson,et al.  Power Aware Computing on GPUs , 2012, 2012 Symposium on Application Accelerators in High Performance Computing.

[35]  Gopinath Chennupati,et al.  An analytical memory hierarchy model for performance prediction , 2017, 2017 Winter Simulation Conference (WSC).