Uncore power scavenger: a runtime for uncore power conservation on HPC systems

The US Department of Energy (DOE) has set a power target of 20-30MW on the first exascale machines. To achieve one exaflop under this power constraint, it is necessary to minimize wasteful consumption of power while striving to improve performance. Toward this end, we investigate uncore frequency scaling (UFS) as a potential knob for reducing the power footprint of HPC jobs. We propose Uncore Power Scavenger (UPSCavenger), a runtime system that dynamically detects phase changes and automatically sets the best uncore frequency for every phase to save power without significant impact on performance. Our experimental evaluations on a cluster show that UPSCavenger achieves up to 10% energy savings with under 1% slowdown. It achieves 14% energy savings with the worst case slowdown of 5.5%. We also show that UPSCavenger achieves up to 20% speedup and proportional energy savings compared to Intel's RAPL with equivalent power usage making it a viable solution even for power-constrained computing.

[1]  Xi Chen,et al.  Up by their bootstraps: Online learning in Artificial Neural Networks for CMP uncore power management , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[2]  Martin Schulz,et al.  A Run-Time System for Power-Constrained HPC Applications , 2015, ISC.

[3]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[4]  Allan Porterfield,et al.  An Adaptive Core-Specific Runtime for Energy Efficiency , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Osman Sarood Optimizing performance under thermal and power constraints for HPC data centers , 2014 .

[6]  Laxmikant V. Kalé,et al.  Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[7]  Wu-chun Feng,et al.  Towards energy-proportional computing for enterprise-class server workloads , 2013, ICPE '13.

[8]  Martin Schulz,et al.  Practical Resource Management in Power-Constrained, High Performance Computing , 2015, HPDC.

[9]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[10]  Frank Mueller,et al.  PShifter: feedback-based dynamic power shifting within HPC jobs for performance , 2018, HPDC.

[11]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[12]  Thomas Ilsche,et al.  An Energy Efficiency Feature Survey of the Intel Haswell Processor , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[13]  Kai Ma,et al.  PGCapping: Exploiting power gating for power capping and core lifetime balancing in CMPs , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  Laxmikant V. Kale,et al.  Scheduling for HPC Systems with Process Variation Heterogeneity , 2014 .

[15]  Mary Jane Irwin,et al.  Core vs. uncore: The heart of darkness , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[17]  Martin Schulz,et al.  Exploring hardware overprovisioning in power-constrained, high performance computing , 2013, ICS '13.

[18]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[19]  Laxmikant V. Kalé,et al.  Energy-efficient computing for HPC workloads on heterogeneous manycore chips , 2015, PMAM@PPoPP.

[20]  Vladimir Getov,et al.  PMPI: High-Level Message Passing in Fortran 77 and C , 1997, HPCN Europe.

[21]  Masha Sosonkina,et al.  Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS , 2018, SpringSim.

[22]  Vincent W. Freeh,et al.  Boosting Data Center Performance Through Non-Uniform Power Allocation , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[23]  Pawel Gepner,et al.  Multi-Core Processors: New Way to Achieve High System Performance , 2006, PARELEC.

[24]  Martin Schulz,et al.  POW: System-wide Dynamic Reallocation of Limited Power in HPC , 2015, HPDC.

[25]  Sally A. McKee,et al.  Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.

[26]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[27]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[28]  Yuichi Inadomi,et al.  Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[29]  Karsten Schwan,et al.  The Forgotten 'Uncore': On the Energy-Efficiency of Heterogeneous Cores , 2012, USENIX Annual Technical Conference.

[30]  Gabriel H. Loh,et al.  The Cost of Uncore in Throughput-Oriented Many-Core Processors , 2008 .

[31]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[32]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[33]  Ren Wang,et al.  Energy-efficient interconnect via Router Parking , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[34]  Vincent W. Freeh,et al.  Safe Overprovisioning: Using Power Limits to Increase Aggregate Throughput , 2004, PACS.

[35]  Fuat Keceli,et al.  Global Extensible Open Power Manager: A Vehicle for HPC Community Collaboration on Co-Designed Energy Management Solutions , 2017, ISC.

[36]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[37]  Masha Sosonkina,et al.  Core and Uncore Joint Frequency Scaling Strategy , 2018 .

[38]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[39]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[40]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[41]  Stephen L. Olivier,et al.  Standardizing Power Monitoring and Control at Exascale , 2016, Computer.

[42]  Frank Mueller,et al.  Power tuning HPC jobs on power-constrained systems , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[43]  Diana Marculescu,et al.  Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors , 2012, ISLPED '12.

[44]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[45]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[46]  Martin Schulz,et al.  Finding the limits of power-constrained application performance , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.