Neural Network-Based Task Scheduling with Preemptive Fan Control

As cooling cost is a significant portion of the total operating cost of supercomputers, improving the efficiency of the cooling mechanisms can significantly reduce the cost. Two sources of cooling inefficiency in existing computing systems are discussed in this paper: temperature variations, and reactive fan speed control. To address these problems, we propose a learning-based approach using a neural network model to accurately predict core temperatures, a preemptive fan control mechanism, and a thermal-aware load balancing algorithm that uses the temperature prediction model. We demonstrate that temperature variations among cores can be reduced from 9°C to 2°C, and that peak fan power can be reduced by 61%. These savings are realized with minimal performance degradation.

[1]  Sandeep Aswath Narayana An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform , 2016, ArXiv.

[2]  Martin Schulz,et al.  Practical Resource Management in Power-Constrained, High Performance Computing , 2015, HPDC.

[3]  Hai Jin,et al.  Using NARX Neural Network Based Load Prediction to Improve Scheduling Decision in Grid Environments , 2007, Third International Conference on Natural Computation (ICNC 2007).

[4]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[5]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[6]  Seda Ogrenci Memik,et al.  Minimizing Thermal Variation Across System Components , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[7]  Laxmikant V. Kalé,et al.  Thermal aware automated load balancing for HPC applications , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[8]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[9]  Laxmikant V. Kale,et al.  Automating Topology Aware Mapping for Supercomputers , 2010 .

[10]  John Shalf,et al.  DOE Advanced Scientific Computing Advisory Subcommittee (ASCAC) Report: Top Ten Exascale Research Challenges , 2014 .

[11]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  J. J. Moré,et al.  Levenberg--Marquardt algorithm: implementation and theory , 1977 .

[13]  Laxmikant V. Kalé,et al.  A ‘cool’ load balancer for parallel applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Yasushi Inoguchi,et al.  Performance evaluation of a Green Scheduling Algorithm for energy savings in Cloud computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[15]  Stephen L. Olivier,et al.  Early experiences with node-level power capping on the Cray XC40 platform , 2015, E2SC '15.

[16]  Dario Pompili,et al.  VMAP: Proactive thermal-aware virtual machine allocation in HPC cloud datacenters , 2012, 2012 19th International Conference on High Performance Computing.

[17]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[18]  Bishop Brock,et al.  Accurate Fine-Grained Processor Power Proxies , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Laxmikant V. Kalé,et al.  Variation Among Processors Under Turbo Boost in HPC Systems , 2016, ICS.

[20]  Yuichi Inadomi,et al.  Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.