Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions

The power scaling challenge associated with Exascale systems is a well-known issue. In this work, we introduce the Global Extensible Open Power Manager (GEOPM): a tree-hierarchical, open source runtime framework we are contributing to the HPC community to foster increased collaboration and accelerated progress toward software-hardware co-designed energy management solutions that address Exascale power challenges and improve performance and energy efficiency in current systems. Through its plugin extensible architecture, GEOPM enables rapid prototyping of new energy management strategies. Different plugins can be tailored to the specific performance or energy efficiency priorities of each HPC center. To demonstrate the potential of the framework, this work develops an example plugin for GEOPM. This power rebalancing plugin targets power-capped systems and improves efficiency by minimizing job time-to-solution within a power budget. Our results demonstrate up to 30% improvements in the time-to-solution of CORAL system procurement benchmarks on a Xeon Phi cluster.

[1]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[2]  Mateo Valero,et al.  Linear programming based parallel job scheduling for power constrained systems , 2011, 2011 International Conference on High Performance Computing & Simulation.

[3]  Karl W. Schulz,et al.  Cluster Computing with OpenHPC , 2016 .

[4]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Yuichi Inadomi,et al.  Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005 .

[8]  Torsten Wilde,et al.  A Case Study of Energy Aware Scheduling on SuperMUC , 2014, ISC.

[9]  Michael Gschwind OpenPOWER: Reengineering a server ecosystem for large-scale data centers , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[10]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[11]  Stephen L. Olivier,et al.  High Performance Computing - Power Application Programming Interface Specification Version 1.1a , 2016 .

[12]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[13]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Mateo Valero,et al.  Optimizing job performance under a given power constraint in HPC centers , 2010, International Conference on Green Computing.

[15]  Martin Schulz,et al.  Systemwide Power Management with Argo , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[16]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[17]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[18]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[19]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[20]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[21]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).