Tree structured analysis on GPU power study

Graphics Processing Units (GPUs) have emerged as a promising platform for parallel computation. With a large number of processor cores and abundant memory bandwidth, GPUs deliver substantial computation power. While providing high computation performance, a GPU consumes high power and needs sufficient power supplies and cooling systems. It is essential to institute an efficient mechanism for evaluating and understanding the power consumption when running real applications on high-end GPUs. In this paper, we present a high-level GPU power consumption model using sophisticated tree-based random forest methods which correlate and predict the power consumption using a set of performance variables. We demonstrate that this statistical model not only predicts the GPU runtime power consumption more accurately than existing regression based approaches, but more importantly, it provides sufficient insights into understanding the correlation of the GPU power consumption with individual performance metrics. We use a GPU simulator that can collect more runtime performance metrics than hardware counters. We measure the power consumption of a wide-range of CUDA kernels on an experimental system with GTX 280 GPU to collect statistical samples for power analysis. The proposed method is applicable to other GPUs as well.

[1]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Kevin Skadron,et al.  Studying Thermal Management for Graphics-Processor Architectures , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[5]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[6]  Weiguo Liu,et al.  Performance Predictions for General-Purpose Computation on GPUs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Erik Lindholm,et al.  NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[9]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[10]  Wen-mei W. Hwu,et al.  Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.

[11]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[12]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[13]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[15]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16]  Arnaud Tisserand,et al.  Power Consumption of GPUs from a Software Perspective , 2009, ICCS.

[17]  J. Xu OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[18]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[19]  Reiji Suda,et al.  Investigation on the power efficiency of multi-core and GPU Processing Element in large scale SIMD computation with CUDA , 2010, International Conference on Green Computing.

[20]  Justin P. Haldar,et al.  Accelerating iterative field-compensated MR image reconstruction on GPUs , 2010, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[21]  Lu Peng,et al.  Weak execution ordering - exploiting iterative methods on many-core GPUs , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[22]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[23]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[24]  Bin Li,et al.  Statistical GPU power analysis using tree-based methods , 2011, 2011 International Green Computing Conference and Workshops.

[25]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[26]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[27]  K. Ramani,et al.  PowerRed : A Flexible Modeling Framework for Power Efficiency Exploration in GPUs , .