Toward an End-to-End Auto-tuning Framework in HPC PowerStack

Efficiently utilizing procured power and optimizing performance of scientific applications under power and energy constraints are challenging. The HPC PowerStack defines a software stack to manage power and energy of high-performance computing systems and standardizes the interfaces between different components of the stack. This survey paper presents the findings of a working group focused on the end-to-end tuning of the PowerStack. First, we provide a background on the PowerStack layer-specific tuning efforts in terms of their high-level objectives, the constraints and optimization goals, layer-specific telemetry, and control parameters, and we list the existing software solutions that address those challenges. Second, we propose the PowerStack end-to-end auto-tuning framework, identify the opportunities in co-tuning different layers in the PowerStack, and present specific use cases and solutions. Third, we discuss the research opportunities and challenges for collective auto-tuning of two or more management layers (or domains) in the PowerStack. This paper takes the first steps in identifying and aggregating the important R&D challenges in streamlining the optimization efforts across the layers of the PowerStack.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Michael Gerndt,et al.  Domain knowledge specification for energy tuning , 2019, Concurr. Comput. Pract. Exp..

[3]  Gregory A. Koenig,et al.  Energy and Power Aware Job Scheduling and Resource Management: Global Survey — Initial Analysis , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Eduardo Cesar Galobardes,et al.  Automatic Tuning of HPC Applications. The Periscope Tuning Framework , 2015 .

[5]  Michael Gerndt,et al.  Towards Elastic Resource Management , 2017 .

[6]  Allan Porterfield,et al.  An Adaptive Core-Specific Runtime for Energy Efficiency , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[7]  Stephen L. Olivier,et al.  High Performance Computing - Power Application Programming Interface Specification Version 1.1a , 2016 .

[8]  Rushil Anirudh,et al.  Performance Modeling under Resource Constraints Using Deep Transfer Learning , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Ondrej Meca,et al.  A massively parallel and memory-efficient FEM toolbox with a hybrid total FETI solver with accelerator support , 2018, Int. J. High Perform. Comput. Appl..

[10]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[11]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[12]  Robert J. Fowler,et al.  Application Runtime Variability and Power Optimization for Exascale Computers , 2015, ROSS@HPDC.

[13]  Luca Benini,et al.  Countdown Slack: A Run-Time Library to Reduce Energy Footprint in Large-Scale MPI Applications , 2019, IEEE Transactions on Parallel and Distributed Systems.

[14]  Allan Porterfield,et al.  Using Dynamic Duty Cycle Modulation to Improve Energy Efficiency in High Performance Computing , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[15]  Stephen L. Olivier,et al.  Standardizing Power Monitoring and Control at Exascale , 2016, Computer.

[16]  Michael Gerndt,et al.  Invasive Computing for Power Corridor Management , 2019, PARCO.

[17]  Luca Benini,et al.  COUNTDOWN: A Run-Time Library for Performance-Neutral Energy Saving in MPI Applications , 2018, IEEE Transactions on Computers.

[18]  Fuat Keceli,et al.  Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions , 2017, ISC.

[19]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[20]  Venkatesh Kannan,et al.  The READEX formalism for automatic tuning for energy efficiency , 2016, Computing.

[21]  Venkatesh Kannan,et al.  Evaluation of the HPC applications dynamic behavior in terms of energy consumption , 2017 .

[22]  W. Marsden I and J , 2012 .

[23]  Xingfu Wu,et al.  Using Performance-Power Modeling to Improve Energy Efficiency of HPC Applications , 2016, Computer.