A first look at integrated GPUs for green high-performance computing

The graphics processing unit (GPU) has evolved from a single-purpose graphics accelerator to a tool that can greatly accelerate the performance of high-performance computing (HPC) applications. Previous studies have shown that discrete GPUs, while energy efficient for compute-intensive scientific applications, consume very high power. In fact, a compute-capable discrete GPU can draw more than 200 watts by itself, which can be as much as an entire compute node (without a GPU). This massive power draw presents a serious roadblock to the adoption of GPUs in low-power environments, such as embedded systems. Even when being considered for data centers, the power draw of a GPU presents a problem as it increases the demand placed on support infrastructure such as cooling and available supplies of power, driving up cost. With the advent of compute-capable integrated GPUs with power consumption in the tens of watts, we believe it is time to re-evaluate the notion of GPUs being power-hungry.In this paper, we present the first evaluation of the energy efficiency of integrated GPUs for green HPC. We make use of four specific workloads, each representative of a different computational dwarf, and evaluate them across three different platforms: a multicore system, a high-performance discrete GPU, and a low-power integrated GPU. We find that the integrated GPU delivers superior energy savings and a comparable energy-delay product (EDP) when compared to its discrete counterpart, and it can still outperform the CPUs of a multicore system at a fraction of the power.

[1]  Andrew T. Fenley,et al.  An analytical approach to computing biomolecular electrostatic potential. I. Derivation and analysis. , 2008, The Journal of chemical physics.

[2]  Dan Meng,et al.  Single-particle 3d reconstruction from cryo-electron microscopy images on GPU , 2009, ICS.

[3]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  Mo-Han Fong,et al.  Improved VoIP capacity in mobile WiMAX systems using persistent resource allocation , 2008, IEEE Communications Magazine.

[6]  Wu-chun Feng,et al.  The Bladed Beowulf: a cost-effective alternative to traditional Beowulfs , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[7]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[8]  William J. Dally,et al.  Programmable Stream Processors , 2003, Computer.

[9]  Markus Holtz,et al.  Validation and Applications , 2011 .

[10]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[11]  Hoi-Jun Yoo,et al.  A low-power handheld GPU using logarithmic arithmetic and triple DVFS power domains , 2007, GH '07.

[12]  Andrew T. Fenley,et al.  An analytical approach to computing biomolecular electrostatic potential. II. Validation and applications. , 2008, The Journal of chemical physics.

[13]  Majid Sarrafzadeh,et al.  Energy-aware high performance computing with graphic processing units , 2008, CLUSTER 2008.

[14]  Wu-chun Feng,et al.  Towards efficient supercomputing: a quest for the right metric , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[15]  Vijay S. Pande,et al.  OpenMM: A Hardware-Independent Framework for Molecular Simulations , 2010, Computing in Science & Engineering.

[16]  Satoshi Matsuoka,et al.  Power-aware dynamic task scheduling for heterogeneous accelerated clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Klaus Schulten,et al.  Multilevel summation of electrostatic potentials using graphics processing units , 2009, Parallel Comput..

[18]  Klaus-Dieter Lange,et al.  Identifying Shades of Green: The SPECpower Benchmarks , 2009, Computer.

[19]  Jack J. Dongarra,et al.  The LINPACK Benchmark: An Explanation , 1988, ICS.

[20]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[22]  Edans Flavius de Oliveira Sandes,et al.  CUDAlign: using GPU to accelerate the comparison of megabase genomic sequences , 2010, PPoPP '10.

[23]  Kevin Skadron,et al.  Fine-grained graphics architectural simulation with Qsilver , 2005, SIGGRAPH '05.

[24]  Wu-chun Feng,et al.  Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. , 2010, Journal of molecular graphics & modelling.

[25]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[26]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[27]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[28]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.