Thermal Prediction for Efficient Energy Management of Clouds Using Machine Learning

Thermal management in the hyper-scale cloud data centers is a critical problem. Increased host temperature creates hotspots which significantly increases cooling cost and affects reliability. Accurate prediction of host temperature is crucial for managing the resources effectively. Temperature estimation is a non-trivial problem due to thermal variations in the data center. Existing solutions for temperature estimation are inefficient due to their computational complexity and lack of accurate prediction. However, data-driven machine learning methods for temperature prediction is a promising approach. In this regard, we collect and study data from a private cloud and show the presence of thermal variations. We investigate several machine learning models to accurately predict the host temperature. Specifically, we propose a gradient boosting machine learning model for temperature prediction. The experiment results show that our model accurately predicts the temperature with the average RMSE value of 0.05 or an average prediction error of 2.38 <inline-formula><tex-math notation="LaTeX">$^\circ \mathrm{C}$</tex-math><alternatives><mml:math><mml:mrow><mml:msup><mml:mrow/><mml:mo>∘</mml:mo></mml:msup><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="ilager-ieq1-3040800.gif"/></alternatives></inline-formula>, which is 6 <inline-formula><tex-math notation="LaTeX">$^\circ \mathrm{C}$</tex-math><alternatives><mml:math><mml:mrow><mml:msup><mml:mrow/><mml:mo>∘</mml:mo></mml:msup><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="ilager-ieq2-3040800.gif"/></alternatives></inline-formula> less as compared to an existing theoretical model. In addition, we propose a dynamic scheduling algorithm to minimize the peak temperature of hosts. The results show that our algorithm reduces the peak temperature by 6.5 <inline-formula><tex-math notation="LaTeX">$^\circ \mathrm{C}$</tex-math><alternatives><mml:math><mml:mrow><mml:msup><mml:mrow/><mml:mo>∘</mml:mo></mml:msup><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="ilager-ieq3-3040800.gif"/></alternatives></inline-formula> and consumes 34.5 percent less energy as compared to the baseline algorithm.

[1]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[2]  Rajkumar Buyya,et al.  Energy Efficient Scheduling of Cloud Application Components with Brownout , 2016, IEEE Transactions on Sustainable Computing.

[3]  Alexandru Iosup,et al.  Statistical Characterization of Business-Critical Workloads Hosted in Cloud Datacenters , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[4]  Nishant Kumar,et al.  Using big data to enhance the bosch production line performance: A Kaggle challenge , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[5]  Henry Hoffmann,et al.  Energy-efficient Application Resource Scheduling using Machine Learning Classifiers , 2018, ICPP.

[6]  Vice President,et al.  AMERICAN SOCIETY OF HEATING, REFRIGERATION AND AIR CONDITIONING ENGINEERS INC. , 2007 .

[7]  Ji Li,et al.  DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[10]  Jeffrey S. Chase,et al.  Making Scheduling "Cool": Temperature-Aware Workload Placement in Data Centers , 2005, USENIX Annual Technical Conference, General Track.

[11]  Peter Garraghan,et al.  Holistic Virtual Machine Scheduling in Cloud Datacenters towards Minimizing Total Energy , 2018, IEEE Transactions on Parallel and Distributed Systems.

[12]  Rajkumar Buyya,et al.  Cost of Virtual Machine Live Migration in Clouds: A Performance Evaluation , 2009, CloudCom.

[13]  Gargi Dasgupta,et al.  Server Workload Analysis for Power Minimization using Consolidation , 2009, USENIX Annual Technical Conference.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[16]  Seda Ogrenci Memik,et al.  Minimizing Thermal Variation in Heterogeneous HPC Systems with FPGA Nodes , 2018, 2018 IEEE 36th International Conference on Computer Design (ICCD).

[17]  Karam S. Chatha,et al.  Approximation algorithm for the temperature-aware scheduling problem , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[18]  Rajkumar Buyya,et al.  ETAS: Energy and thermal‐aware dynamic virtual machine consolidation in cloud data center with proactive hotspot mitigation , 2019, Concurr. Comput. Pract. Exp..

[19]  Jean-Marc Pierson,et al.  Spatio-temporal thermal-aware scheduling for homogeneous high-performance computing datacenters , 2017, Future Gener. Comput. Syst..

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Wei Huang,et al.  Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[22]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[23]  Richard E. Brown,et al.  United States Data Center Energy Usage Report , 2016 .

[24]  Florin Pop,et al.  New scheduling approach using reinforcement learning for heterogeneous distributed systems , 2017, J. Parallel Distributed Comput..

[25]  Seda Ogrenci Memik,et al.  Minimizing Thermal Variation Across System Components , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[26]  Geoffrey C. Fox,et al.  Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[27]  Cullen E. Bash,et al.  Smart cooling of data centers , 2003 .

[28]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[29]  Harvey Thompson,et al.  Computational fluid dynamic investigation of liquid rack cooling in data centres , 2012 .

[30]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[31]  Joonwon Lee,et al.  A CFD-Based Tool for Studying Temperature in Rack-Mounted Servers , 2008, IEEE Transactions on Computers.

[32]  Gokhan Memik,et al.  Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components , 2018, IEEE Transactions on Parallel and Distributed Systems.

[33]  José Manuel Moya,et al.  Runtime data center temperature prediction using Grammatical Evolution techniques , 2016, Appl. Soft Comput..

[34]  Marina Zapater Sancho,et al.  Self-Organizing maps for detecting abnormal thermal behavior in data centers , 2015, IEEE CLOUD 2015.

[35]  Srikanth Kandula,et al.  Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[36]  Sandeep K. S. Gupta,et al.  Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach , 2008, IEEE Transactions on Parallel and Distributed Systems.

[37]  Jim Gao,et al.  Machine Learning Applications for Data Center Optimization , 2014 .

[38]  Ricardo Bianchini,et al.  Toward ML-centric cloud platforms , 2020, Commun. ACM.

[39]  新 雅夫,et al.  ASHRAE(American Society of Heating,Refrigerating and Air-Conditioning Engineers)大会"国際年"行事に参加して , 1975 .