Energy-Aware High Performance Computing - A Survey

Abstract Power consumption of hardware and energy-efficiency of software have become major topics in High Performance Computing in the last couple of years. To reach the goal of 20 MW for an Exascale system, a holistic approach is needed—the efficiency of the data center itself, the hardware components, and the software have to be taken into account and optimized. We present the current state of hardware power management and sketch the next generation of hardware components. Furthermore, special HPC architectures with a strong focus on energy-efficiency are presented. Software efficiency is essential on all levels from cluster management over system software to the applications running on the system. Solutions to increase the efficiency are presented on all that levels, we discuss vendor tools for cluster management, tools and run-time systems to increase the efficiency of parallel applications, and show algorithmic improvements. Finally we present the eeClust project, a project that aims to reduce the energy consumption of HPC clusters by an integrated approach of application analysis, hardware management, and monitoring.

[1]  Avinash Kodi,et al.  Reconfigurable and adaptive photonic networks for high-performance computing systems. , 2009, Applied optics.

[2]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[3]  Holger Fröning,et al.  An FPGA-Based Custom High Performance Interconnection Network , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[4]  Dimos Poulikakos,et al.  Aquasar: A hot water cooled data center with direct energy reuse , 2012 .

[5]  Holger Fröning,et al.  VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers , 2008, 2008 37th International Conference on Parallel Processing.

[6]  Ralf Gruber,et al.  HPC@Green IT: Green High Performance Computing Methods , 2010 .

[7]  Morris Riedel,et al.  LLview: User-level Monitoring in Computational Grids and e-Science Infrastructures , 2007 .

[8]  Thomas M. Conte,et al.  Energy efficient Phase Change Memory based main memory for future high performance systems , 2011, 2011 International Green Computing Conference and Workshops.

[9]  Jonathan Chang,et al.  A 45 nm 8-Core Enterprise Xeon¯ Processor , 2010, IEEE J. Solid State Circuits.

[10]  Luigi Brochard,et al.  Optimizing performance and energy of HPC applications on POWER7 , 2010, Computer Science - Research and Development.

[11]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010 .

[12]  Enrique S. Quintana-Ortí,et al.  Energy-efficient execution of dense linear algebra algorithms on multi-core processors , 2012, Cluster Computing.

[13]  Claude Gomez,et al.  QPACE: power-efficient parallel architecture based on IBM PowerXCell 8i , 2010, Computer Science - Research and Development.

[14]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[15]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[16]  Reza Zamani,et al.  A feasibility analysis of power-awareness and energy minimization in modern interconnects for high-performance computing , 2007, 2007 IEEE International Conference on Cluster Computing.

[17]  Mateo Valero,et al.  Power-aware load balancing of large scale MPI applications , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[18]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[19]  Satoshi Matsuoka,et al.  GPU accelerated computing—from hype to mainstream, the rebirth of vector computing , 2009 .

[20]  Luca Benini,et al.  Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support , 2007, IEEE Transactions on Computers.

[21]  Bernd Mohr,et al.  Determine energy-saving potential in wait-states of large-scale parallel programs , 2011, Computer Science - Research and Development.

[22]  Kang G. Shin,et al.  Improving energy efficiency by making DRAM less randomly accessed , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[23]  Kirk W. Cameron,et al.  Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems , 2012, Computer Science - Research and Development.

[24]  Thomas Ludwig,et al.  Simulation of power consumption of energy efficient cluster hardware , 2010, Computer Science - Research and Development.

[25]  Manish Marwah,et al.  Delivering Energy Proportionality with Non Energy-Proportional Systems - Optimizing the Ensemble , 2008, HotPower.

[26]  Narayanan Vijaykrishnan,et al.  Effect of compiler optimizations on memory energy , 2000, 2000 IEEE Workshop on SiGNAL PROCESSING SYSTEMS. SiPS 2000. Design and Implementation (Cat. No.00TH8528).

[27]  Bernd Mohr,et al.  Managing hardware power saving modes for high performance computing , 2011, 2011 International Green Computing Conference and Workshops.

[28]  Ramy E. Aly,et al.  A Family of 32 nm IA Processors , 2011, IEEE Journal of Solid-State Circuits.

[29]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[30]  Klaus-Dieter Lange,et al.  Identifying Shades of Green: The SPECpower Benchmarks , 2009, Computer.

[31]  Jack J. Dongarra,et al.  Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency , 2012, Computer Science - Research and Development.

[32]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Wolfgang E. Nagel,et al.  Flexible workload generation for HPC cluster efficiency benchmarking , 2012, Computer Science - Research and Development.

[34]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[35]  Enrique S. Quintana-Ortí,et al.  DVFS-control techniques for dense linear algebra operations on multi-core processors , 2012, Computer Science - Research and Development.

[36]  Matthias S. Müller,et al.  Developing Scalable Applications with Vampir, VampirServer and VampirTrace , 2007, PARCO.

[37]  Boyana Norris,et al.  A component infrastructure for performance and power modeling of parallel scientific applications , 2008, CBHPC '08.

[38]  E. Anderson,et al.  Performance of the CRAY T3E Multiprocessor , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[39]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[40]  Wu-chun Feng,et al.  High-Density Computing: A 240-Processor Beowulf in One Cubic Meter , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[41]  Mary Jane Irwin,et al.  Techniques for low energy software , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[42]  Sam Miller,et al.  Blue Gene/Q resource management architecture , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[43]  Bernd Mohr,et al.  Electronic poster: eeclust: energy-efficient cluster computing , 2011, SC '11 Companion.

[44]  Efraim Rotem,et al.  Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge , 2012, IEEE Micro.

[45]  Thomas Ilsche,et al.  The VampirTrace Plugin Counter Interface: Introduction and Examples , 2010, Euro-Par Workshops.

[46]  Arndt Bode,et al.  Principles of Energy Efficiency in High Performance Computing , 2011, ICT-GLOW.

[47]  Leonid Oliker,et al.  Energy-Efficient Computing for Extreme-Scale Science , 2009, Computer.

[48]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[49]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[50]  Cyriel Minkenberg,et al.  Designing a Crossbar Scheduler for HPC Applications , 2006, IEEE Micro.

[51]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[52]  Dietmar Fey,et al.  Optical Multiplexing Techniques for Photonic Clos Networks in High Performance Computing Architectures , 2009, OSC.

[53]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[54]  Benjamin Krill,et al.  QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine , 2008, Computing in Science & Engineering.

[55]  Masha Sosonkina,et al.  Per-call Energy Saving Strategies in All-to-All Communications , 2011, EuroMPI.

[56]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[57]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[58]  Dieter an Mey,et al.  Brainware for green HPC , 2012, Computer Science - Research and Development.

[59]  G. Dewey,et al.  Tri-Gate Transistor Architecture with High-k Gate Dielectrics, Metal Gates and Strain Engineering , 2006, 2006 Symposium on VLSI Technology, 2006. Digest of Technical Papers..

[60]  Mitsuhisa Sato,et al.  Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster , 2006, 2006 IEEE International Conference on Cluster Computing.

[61]  Mikko Majanen,et al.  Energy-aware job scheduler for high-performance computing , 2012, Computer Science - Research and Development.

[62]  Daisuke Takahashi,et al.  The HPC Challenge (HPCC) benchmark suite , 2006, SC.

[63]  Vanish Talwar,et al.  Power Management of Datacenter Workloads Using Per-Core Power Gating , 2009, IEEE Computer Architecture Letters.

[64]  Francisco J. Cazorla,et al.  Energy-Aware Accounting and Billing in Large-Scale Computing Facilities , 2011, IEEE Micro.

[65]  Freeman L. Rawson,et al.  EnergyScale for IBM POWER6 microprocessor-based systems , 2007, IBM J. Res. Dev..

[66]  Kunle Olukotun,et al.  Hardware/software co-design for high performance computing: Challenges and opportunities , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[67]  John L. Klepeis,et al.  High-throughput pairwise point interactions in Anton, a specialized machine for molecular dynamics simulation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[68]  Avinash Karanth Kodi,et al.  Energy-Efficient and Bandwidth-Reconfigurable Photonic Networks for High-Performance Computing (HPC) Systems , 2011, IEEE Journal of Selected Topics in Quantum Electronics.

[69]  Paul D. Franzon,et al.  Computing with Novel Floating-Gate Devices , 2011, Computer.

[70]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[71]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.

[72]  Enrique S. Quintana-Ortí,et al.  Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors , 2011, Computer Science - Research and Development.

[73]  Carla Schlatter Ellis,et al.  The Synergy Between Power-Aware Memory Systems and Processor Voltage Scaling , 2003, PACS.

[74]  Hermann de Meer,et al.  Evaluating and modeling power consumption of multi-core processors , 2012, 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy).

[75]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[76]  Sayantan Sur,et al.  Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters , 2010, 2010 39th International Conference on Parallel Processing.

[77]  S. Parkin,et al.  Magnetic Domain-Wall Racetrack Memory , 2008, Science.

[78]  Jiuxing Liu,et al.  Evaluating high performance communication: a power perspective , 2009, ICS.

[79]  John E. Stone,et al.  GPU clusters for high-performance computing , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[80]  John E. Stone,et al.  Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters , 2010, International Conference on Green Computing.

[81]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[82]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[83]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[84]  Jason Duell,et al.  Productivity and performance using partitioned global address space languages , 2007, PASCO '07.

[85]  Wu-chun Feng,et al.  Making a case for a Green500 list , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[86]  Hartwig Anzt,et al.  Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms , 2010, Computer Science - Research and Development.

[87]  Mateo Valero,et al.  Optimizing job performance under a given power constraint in HPC centers , 2010, International Conference on Green Computing.

[88]  Shuaiwen Song,et al.  Designing energy efficient communication runtime systems: a view from PGAS models , 2013, The Journal of Supercomputing.

[89]  Holger Fröning,et al.  A Case for FPGA Based Accelerated Communication , 2010, 2010 Ninth International Conference on Networks.

[90]  Mahmut T. Kandemir,et al.  Improving memory energy using access pattern classification , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[91]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[92]  Bernd Mohr,et al.  Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications , 2008, Parallel Tools Workshop.

[93]  Mary Jane Irwin,et al.  On improving performance and energy profiles of sparse scientific applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[94]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[95]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[96]  J. P. Grossman,et al.  Incorporating flexibility in Anton, a specialized machine for molecular dynamics simulation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[97]  Wu-chun Feng,et al.  Towards efficient supercomputing: a quest for the right metric , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[98]  Josep Torrellas Architectures for Extreme-Scale Computing , 2009, Computer.

[99]  William M. Corwin,et al.  Overview of the IEEE POSIX P1003.4 realtime extension to POSIX , 1990 .

[100]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[101]  Owen Liu AMD technology: power, performance and the future , 2007, China HPC.

[102]  Matthias S. Müller,et al.  Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors , 2010, International Conference on Green Computing.

[103]  Wolfgang Frings,et al.  Measuring power consumption on IBM Blue Gene/P , 2011, Computer Science - Research and Development.

[104]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[105]  David Blaauw,et al.  Near-Threshold Computing: Reclaiming Moore's Law Through Energy Efficient Integrated Circuits , 2010, Proceedings of the IEEE.

[106]  Nam Sung Kim,et al.  Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[107]  Dong Li,et al.  Power-aware MPI task aggregation prediction for high-end computing systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[108]  Christopher Batten,et al.  Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics , 2009, IEEE Micro.

[109]  Wu-chun Feng,et al.  The Green500 list: escapades to exascale , 2012, Computer Science - Research and Development.

[110]  Mahmut T. Kandemir,et al.  Energy-oriented compiler optimizations for partitioned memory architectures , 2000, CASES '00.

[111]  Torsten Hoefler Software and Hardware Techniques for Power-Efficient HPC Networking , 2010, Computing in Science & Engineering.

[112]  Balaram Sinharoy,et al.  POWER7: IBM's next generation server processor , 2010, 2009 IEEE Hot Chips 21 Symposium (HCS).

[113]  D. Stewart,et al.  The missing memristor found , 2008, Nature.

[114]  Torsten Hoefler,et al.  Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.

[115]  Xuejun Yang,et al.  Low Power Optimization for MPI Collective Operations , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[116]  Sally A. McKee,et al.  Portable, scalable, per-core power estimation for intelligent resource management , 2010, International Conference on Green Computing.

[117]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[118]  Mahmut T. Kandemir,et al.  Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems , 2002, CC.

[119]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[120]  Min Yeol Lim,et al.  Determining the Minimum Energy Consumption using Dynamic Voltage and Frequency Scaling , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[121]  Kiyoung Choi,et al.  Exploiting New Interconnect Technologies in On-Chip Communication , 2012, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[122]  Constantine Bekas,et al.  A new energy aware performance metric , 2010, Computer Science - Research and Development.

[123]  Rolf Krause,et al.  A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations , 2012, Comput. Phys. Commun..

[124]  Fang Liu,et al.  Dynamic Frequency Scaling and Energy Saving in Quantum Chemistry Applications , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[125]  Leonid Oliker,et al.  Towards Ultra-High Resolution Models of Climate and Weather , 2008, Int. J. High Perform. Comput. Appl..

[126]  Bernd Mohr,et al.  eeClust: Energy-Efficient Cluster Computing , 2010, CHPC.

[127]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[128]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[129]  Jack J. Dongarra,et al.  The LINPACK Benchmark: past, present and future , 2003, Concurr. Comput. Pract. Exp..

[130]  Wu-chun Feng,et al.  A first look at integrated GPUs for green high-performance computing , 2010, Computer Science - Research and Development.

[131]  Pradip Bose,et al.  Dynamic power gating with quality guarantees , 2009, ISLPED.

[132]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[133]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[134]  Mateo Valero,et al.  Utilization driven power-aware parallel job scheduling , 2010, Computer Science - Research and Development.

[135]  Douglas G. Down,et al.  Power-Aware Linear Programming based Scheduling for heterogeneous computer clusters , 2010, International Conference on Green Computing.

[136]  L. Chua Memristor-The missing circuit element , 1971 .

[137]  Pawel Gepner,et al.  New multi-core Intel Xeon processors help design energy efficient solution for high performance computing , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[138]  Bishop Brock,et al.  Adaptive energy-management features of the IBM POWER7 chip , 2011, IBM J. Res. Dev..

[139]  Mitsuhisa Sato,et al.  Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.