Exploring Energy Efficiency for GPU-Accelerated POWER Servers

Modern servers provide different features for managing the amount of energy that is needed to execute a given work-load. In this article we focus on a new generation of GPU-accelerated servers with POWER8 processors. For different scientific applications, which have in common that they have been written for massively-parallel computers, we measure energy-to-solution for different system configurations. By combining earlier developed performance models and a simple power model, we derive an energy model that can help to optimise for energy efficiency.

[1]  Wolfgang E. Nagel,et al.  HDEEM: High Definition Energy Efficiency Monitoring , 2014, 2014 Energy Efficient Supercomputing Workshop.

[2]  Michael Knobloch,et al.  Mapping fine-grained power measurements to HPC application runtime characteristics on IBM POWER7 , 2013, Computer Science - Research and Development.

[3]  J. Korringa,et al.  On the calculation of the energy of a Bloch wave in a metal , 1947 .

[4]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[5]  R. Freund,et al.  QMR: a quasi-minimal residual method for non-Hermitian linear systems , 1991 .

[6]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[7]  A. Shahmansouri,et al.  GPU Implementation of Split-Field Finite-Difference Time-Domain Method for Drude-Lorentz Dispersive Media , 2012 .

[8]  Sunita Chandrasekaran,et al.  Statistical modeling of power/energy of scientific kernels on a multi-GPU system , 2013, 2013 International Green Computing Conference Proceedings.

[9]  Domingo Giménez,et al.  Analytical Modeling of the Energy Consumption for the High Performance Linpack , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[10]  Jack J. Dongarra,et al.  Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures , 2012, 2012 Second International Conference on Cloud and Green Computing.

[11]  H. Thienpont,et al.  B-CALM: An open-source GPU-based 3D-FDTD with multi-pole dispersion for plasmonics , 2011 .

[12]  Enrique S. Quintana-Ortí,et al.  Modeling power and energy of the task-parallel Cholesky factorization on multicore processors , 2012, Computer Science - Research and Development.

[13]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[14]  Shuaiwen Song,et al.  Unified performance and power modeling of scientific workloads , 2013, E2SC '13.

[15]  Boyana Norris,et al.  A component infrastructure for performance and power modeling of parallel scientific applications , 2008, CBHPC '08.

[16]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[17]  Gerhard Wellein,et al.  Chip‐level and multi‐node analysis of energy‐optimized lattice Boltzmann CFD simulations , 2016, Concurr. Comput. Pract. Exp..

[18]  Wu-chun Feng,et al.  Statistical Power and Performance Modeling for Optimizing the Energy Efficiency of Scientific Computing , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[19]  Matthew M. Ziegler,et al.  The POWER8TM processor: Designed for big data, analytics, and cloud environments , 2014, 2014 IEEE International Conference on IC Design & Technology.

[20]  Zizhong Chen,et al.  A survey of power and energy efficient techniques for high performance numerical linear algebra operations , 2014, Parallel Comput..

[21]  W. Kohn,et al.  Solution of the Schrödinger Equation in Periodic Lattices with an Application to Metallic Lithium , 1954 .

[22]  Stefan Blügel,et al.  Massively parallel density functional calculations for thousands of atoms: KKRnano , 2012 .

[23]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[24]  J. L. Beeby,et al.  The density of electrons in a perfect or imperfect lattice , 1967, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[25]  Stijn Eyerman,et al.  A Counter Architecture for Online DVFS Profitability Estimation , 2010, IEEE Transactions on Computers.

[26]  Victor V. Zyuban,et al.  IBM POWER7+ design for higher frequency at fixed power , 2013, IBM J. Res. Dev..

[27]  Berk Hess,et al.  GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .

[28]  Jiri Kraus,et al.  A Performance Model for GPU-Accelerated FDTD Applications , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[29]  Martin Schulz,et al.  Practical performance prediction under Dynamic Voltage Frequency Scaling , 2011, 2011 International Green Computing Conference and Workshops.

[30]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[31]  Rick Siow Mong Goh,et al.  Implementation of the FDTD Method Based on Lorentz-Drude Dispersive Model on GPU for Plasmonics Applications , 2011 .

[32]  Xiaorui Wang,et al.  Server-Level Power Control , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).

[33]  Geppino Pucci,et al.  The Potential of On-Chip Multiprocessing for QCD Machines , 2005, HiPC.

[34]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[35]  Mahadev Satyanarayanan,et al.  PowerScope: a tool for profiling the energy usage of mobile applications , 1999, Proceedings WMCSA'99. Second IEEE Workshop on Mobile Computing Systems and Applications.

[36]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[37]  Pavel Klavík,et al.  Changing computing paradigms towards power efficiency , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.