Small scale to extreme: Methods for characterizing energy efficiency in supercomputing applications

Abstract Power measurement capabilities are becoming commonplace on large scale HPC system deployments. There exist several different approaches to providing power measurements that are used today, primarily in-band and out-of-band measurements. Both of these fundamental techniques can be augmented with application-level profiling and the combination of different techniques is also possible. However, it can be difficult to assess the type and detail of measurement needed to obtain insights and knowledge of the power profile of an application. In addition, the heterogeneity of modern hybrid supercomputing platforms requires that different CPU architectures must be examined as well. This paper presents a taxonomy for classifying power profiling techniques on modern HPC platforms. Three relevant HPC mini-applications are analyzed across systems of multicore and manycore nodes to examine the level of detail, scope, and complexity of these power profiles. We demonstrate that a combination of out-of-band measurement with in-band application region profiling can provide an accurate, detailed view of power usage without introducing overhead. Furthermore, we confirm the energy and power profile of these mini applications at an extreme scale with the Trinity supercomputer. This finding validates the extrapolation of the power profiling techniques from testbed scale of just several dozen nodes to extreme scale Petaflops supercomputing systems, along with providing a set of recommendations on how to best profile future HPC workloads.

[1]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Courtenay T. Vaughan,et al.  Topics on measuring real power usage on high performance computing platforms , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[4]  Brinkley Sprunt,et al.  The Basics of Performance-Monitoring Hardware , 2002, IEEE Micro.

[5]  Yuankun Xue,et al.  Scalable and realistic benchmark synthesis for efficient NoC performance evaluation: A complex network analysis approach , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  Michael Knobloch,et al.  Mapping fine-grained power measurements to HPC application runtime characteristics on IBM POWER7 , 2013, Computer Science - Research and Development.

[7]  S. Huang,et al.  Energy-Efficient Cluster Computing via Accurate Workload Characterization , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Pradip Bose,et al.  Application-level power and performance characterization and optimization on IBM Blue Gene/Q systems , 2013, IBM J. Res. Dev..

[9]  Laxmikant V. Kalé,et al.  Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Trevor Pering,et al.  Dynamic Voltage Scaling and the Design of a Low-Power Microprocessor System , 1998 .

[11]  Ryan E. Grant,et al.  Program optimizations: The interplay between power, performance, and energy , 2016, Parallel Comput..

[12]  Stephen L. Olivier,et al.  Overcoming Challenges in Scalable Power Monitoring with the Power API , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[13]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005 .

[14]  Radu Marculescu,et al.  Sustainability through massively integrated computing: Are we ready to break the energy efficiency wall for single-chip platforms? , 2011, 2011 Design, Automation & Test in Europe.

[15]  Courtenay T. Vaughan,et al.  Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[16]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[17]  Kevin T. Pedretti,et al.  SST + gem5 = a scalable simulation infrastructure for high performance computing , 2012, SimuTools.

[18]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[19]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[20]  James H. Laros,et al.  Evaluating energy and power profiling techniques for HPC workloads , 2017, 2017 Eighth International Green and Sustainable Computing Conference (IGSC).

[21]  James H. Laros,et al.  Metrics for Evaluating Energy Saving Techniques for Resilient HPC Systems , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[22]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[23]  Shuaiwen Song,et al.  Energy Profiling and Analysis of the HPC Challenge Benchmarks , 2009, Int. J. High Perform. Comput. Appl..

[24]  Stephen L. Olivier,et al.  Standardizing Power Monitoring and Control at Exascale , 2016, Computer.

[25]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[26]  Ryan E. Grant,et al.  Optimizing Explicit Hydrodynamics for Power, Energy, and Performance , 2015, 2015 IEEE International Conference on Cluster Computing.

[27]  Frank Mueller,et al.  Power tuning HPC jobs on power-constrained systems , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[28]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[29]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[30]  William Jalby,et al.  Statistical Validation Methodology of CPU Power Probes , 2014, Euro-Par Workshops.

[31]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[32]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[33]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[34]  James H. Laros,et al.  PowerInsight - A commodity power measurement capability , 2013, 2013 International Green Computing Conference Proceedings.

[35]  Jan Weglarz,et al.  Practical power consumption estimation for real life HPC applications , 2013, Future Gener. Comput. Syst..

[36]  Boyana Norris,et al.  WattProf: A Flexible Platform for Fine-Grained HPC Power Profiling , 2015, 2015 IEEE International Conference on Cluster Computing.

[37]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[38]  Rolf Riesen,et al.  Evaluating energy savings for checkpoint/restart , 2013, E2SC '13.

[39]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[40]  Rami G. Melhem,et al.  Energy Consumption of Resilience Mechanisms in Large Scale Systems , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[41]  Ryan E. Grant,et al.  Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[42]  Radu Marculescu,et al.  Dynamic power management for multicores: Case study using the intel SCC , 2012, 2012 IEEE/IFIP 20th International Conference on VLSI and System-on-Chip (VLSI-SoC).

[43]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[44]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[45]  Ananta Tiwari,et al.  Green Queue: Customized Large-Scale Clock Frequency Scaling , 2012, 2012 Second International Conference on Cloud and Green Computing.

[46]  Ron Brightwell,et al.  Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.