Simple Power-Aware Scheduler to Limit Power Consumption by HPC System within a Budget

Future Exascale systems are projected to require tens of megawatts. While facilities must provision sufficient power to realize peak performance, limited power availability will require power capping. Current approaches for power capping limit CPU power state and are agnostic to workload characteristics. Injudicious use of such mechanisms in HPC system can impose a devastating impact on performance. We propose integrating power limiting into a job scheduler. We will describe a power-aware scheduler that monitors power consumption, distributes the power budget to each job, and implements a "uniform frequency" mechanism to limit power. We will compare three implementations of uniform frequency. We will show that power monitoring improves the probability of launching a job earlier, allows a job to run faster, and reduces stranded power. Our data shows that "auto mode" for uniform frequency operates at 40% higher frequency than a fixed frequency mode.

[1]  Vicente Hernández,et al.  An Energy Manager for High Performance Computer Clusters , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[2]  Zhiling Lan,et al.  Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling , 2013, JSSPP.

[3]  Amit Roy,et al.  Energy-efficient Data Centers and smart temperature control system with IoT sensing , 2016, 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON).

[4]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[5]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6]  Mikko Majanen,et al.  Energy-aware job scheduler for high-performance computing , 2012, Computer Science - Research and Development.

[7]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[8]  Jesús Labarta,et al.  Power-Aware Parallel Job Scheduling , 2012 .

[9]  Lizhe Wang,et al.  Energy-Aware High Performance Computing: A Taxonomy Study , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[10]  Arka Bhattacharya,et al.  Constraints And Techniques For Software Power Management In Production Clusters , 2013 .