Thermal-Aware Power Capping Allocation Model for High Performance Computing Systems

High-performance computing (HPC) systems are large computing infrastructures, which consume massive amount of power during their operation. Power capping is a feature introduced in modern processor architecture to control application performance running on compute nodes. In this paper, we exploit power capping capability in the processors to develop a thermal-aware energy-efficient model for HPC systems. Our model optimizes energy consumption of HPC applications, while ensures processor temperature remains within a limit. We execute various HPC applications and measure different characteristics of execution (e.g., power, performance, temperature). Based on real-life measurements, we demonstrate that our proposed model is effective on achieving thermal-aware energy-efficiency for HPC systems.

[1]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[2]  Ayan Banerjee,et al.  Integrating cooling awareness with thermal aware workload placement for HPC data centers , 2011, Sustain. Comput. Informatics Syst..

[3]  Ayan Banerjee,et al.  Cooling-aware and thermal-aware workload placement for green HPC data centers , 2010, International Conference on Green Computing.

[4]  M. J. D. Powell,et al.  A fast algorithm for nonlinearly constrained optimization calculations , 1978 .

[5]  Yuan He,et al.  Demand-Aware Power Management for Power-Constrained HPC Systems , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[6]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Martin Schulz,et al.  Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[8]  Dario Pompili,et al.  Proactive Thermal-Aware Resource Management in Virtualized HPC Cloud Datacenters , 2017, IEEE Transactions on Cloud Computing.

[9]  Jie Meng,et al.  Optimizing communication and cooling costs in HPC data centers via intelligent job allocation , 2013, 2013 International Green Computing Conference Proceedings.

[10]  Xiaorui Wang,et al.  Power capping: a prelude to power shifting , 2008, Cluster Computing.

[11]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[12]  Ping Huang,et al.  Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[13]  Xu Yang,et al.  Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Jason Liu,et al.  Enabling Demand Response for HPC Systems through Power Capping and Node Scaling , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[15]  Laxmikant V. Kalé,et al.  "Cool" Load Balancing for High Performance Computing Data Centers , 2012, IEEE Trans. Computers.

[16]  Laxmikant V. Kalé,et al.  Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[17]  Jie Meng,et al.  Simulation and optimization of HPC job allocation for jointly reducing communication and cooling costs , 2015, Sustain. Comput. Informatics Syst..

[18]  Laxmikant V. Kalé,et al.  A ‘cool’ load balancer for parallel applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Mengxuan Song,et al.  Thermal-Aware Energy Management of an HPC Data Center via Two-Time-Scale Control , 2017, IEEE Transactions on Industrial Informatics.