PShifter: feedback-based dynamic power shifting within HPC jobs for performance

The US Department of Energy (DOE) has set a power target of 20-30MW on the first exascale machines. To achieve one exaFLOPS under this power constraint, it is necessary to manage power intelligently while maximizing performance. Most production-level parallel applications suffer from computational load imbalance across distributed processes due to non-uniform work decomposition. Other factors like manufacturing variation and thermal variation in the machine room may amplify this imbalance. As a result of this imbalance, some processes of a job reach the blocking calls, collectives or barriers earlier and wait for others to reach the same point. This waiting results in a wastage of energy and CPU cycles which degrades application efficiency and performance. We address this problem for power-limited jobs via Power Shifter (PShifter), a dual-level, feedback-based mechanism that intelligently and automatically detects such imbalance and reduces it by dynamically re-distributing a job's power budget across processors to improve the overall performance of the job compared to a naïve uniform power distribution across nodes. In contrast to prior work, PShifter ensures that a given power budget is not violated. At the bottom level of PShifter, local agents monitor and control the performance of processors by actuating different power levels. They reduce power from the processors that incur substantial wait times. At the top level, the cluster agent that has the global view of the system, monitors the job's power consumption and provides feedback on the unused power, which is then distributed across the processors of the same job. Our evaluation on an Intel cluster shows that PShifter achieves performance improvement of up to 21% and energy savings of up to 23% compared to uniform power allocation, outperforms static approaches by up to 40% and 22% for codes with and without phase changes, respectively, and outperforms dynamic schemes by up to 19%. To the best of our knowledge, PShifter is the first approach to transparently and automatically apply power capping non-uniformly across processors of a job in a dynamic manner adapting to phase changes.

[1]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[2]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[3]  Vincent W. Freeh,et al.  Boosting Data Center Performance Through Non-Uniform Power Allocation , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[4]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Mateo Valero,et al.  Optimizing job performance under a given power constraint in HPC centers , 2010, International Conference on Green Computing.

[6]  Martin Schulz,et al.  POW: System-wide Dynamic Reallocation of Limited Power in HPC , 2015, HPDC.

[7]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[8]  Mateo Valero,et al.  Utilization driven power-aware parallel job scheduling , 2010, Computer Science - Research and Development.

[9]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[10]  Stephen L. Olivier,et al.  Standardizing Power Monitoring and Control at Exascale , 2016, Computer.

[11]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[12]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[13]  Yuichi Inadomi,et al.  Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Osman Sarood Optimizing performance under thermal and power constraints for HPC data centers , 2014 .

[15]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[16]  Martin Schulz,et al.  Finding the limits of power-constrained application performance , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Martin Schulz,et al.  Practical Resource Management in Power-Constrained, High Performance Computing , 2015, HPDC.

[18]  Xue Liu,et al.  Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control , 2007, IEEE Transactions on Computers.

[19]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[20]  Vincent W. Freeh,et al.  Safe Overprovisioning: Using Power Limits to Increase Aggregate Throughput , 2004, PACS.

[21]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[22]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[23]  Laxmikant V. Kale,et al.  Scheduling for HPC Systems with Process Variation Heterogeneity , 2014 .

[24]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[25]  Indrani Paul,et al.  Performance Boosting Opportunities under Communication Imbalance in Power-Constrained HPC Clusters , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[26]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[27]  Wei Cai,et al.  Scalable Line Dynamics in ParaDiS , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[28]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[29]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[30]  Laxmikant V. Kalé,et al.  Energy-efficient computing for HPC workloads on heterogeneous manycore chips , 2015, PMAM@PPoPP.

[31]  Frank Mueller,et al.  Power tuning HPC jobs on power-constrained systems , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[32]  Vladimir Getov,et al.  PMPI: High-Level Message Passing in Fortran 77 and C , 1997, HPCN Europe.

[33]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[34]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[35]  Laxmikant V. Kalé,et al.  Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[36]  Allan Porterfield,et al.  An Adaptive Core-Specific Runtime for Energy Efficiency , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).