A Machine Learning Solution for Data Center Thermal Characteristics Analysis

The energy efficiency of Data Center (DC) operations heavily relies on a DC ambient temperature as well as its IT and cooling systems performance. A reliable and efficient cooling system is necessary to produce a persistent flow of cold air to cool servers that are subjected to constantly increasing computational load due to the advent of smart cloud-based applications. Consequently, the increased demand for computing power will inadvertently increase server waste heat creation in data centers. To improve a DC thermal profile which could undeniably influence energy efficiency and reliability of IT equipment, it is imperative to explore the thermal characteristics analysis of an IT room. This work encompasses the employment of an unsupervised machine learning technique for uncovering weaknesses of a DC cooling system based on real DC monitoring thermal data. The findings of the analysis result in the identification of areas for thermal management and cooling improvement that further feeds into DC recommendations. With the aim to identify overheated zones in a DC IT room and corresponding servers, we applied analyzed thermal characteristics of the IT room. Experimental dataset includes measurements of ambient air temperature in the hot aisle of the IT room in ENEA Portici research center hosting the CRESCO6 computing cluster. We use machine learning clustering techniques to identify overheated locations and categorize computing nodes based on surrounding air temperature ranges abstracted from the data. This work employs the principles and approaches replicable for the analysis of thermal characteristics of any DC, thereby fostering transferability. This paper demonstrates how best practices and guidelines could be applied for thermal analysis and profiling of a commercial DC based on real thermal monitoring data.

[1]  Yu Liu,et al.  Analysis of a district heating system using waste heat in a distributed cooling data center , 2018, Applied Thermal Engineering.

[2]  M. Behnia,et al.  Thermal Performance of an Air-Cooled Data Center With Raised-Floor and Non-Raised-Floor Configurations , 2014 .

[3]  Madhusudan K. Iyengar,et al.  Challenges of data center thermal management , 2005, IBM J. Res. Dev..

[4]  MengChu Zhou,et al.  TTSA: An Effective Scheduling Approach for Delay Bounded Tasks in Hybrid Clouds , 2017, IEEE Transactions on Cybernetics.

[5]  Cullen E. Bash,et al.  Efficient Thermal Management of Data Centers—Immediate and Long-Term Research Needs , 2003 .

[6]  Jeffrey S. Chase,et al.  Balance of power: dynamic thermal management for Internet data centers , 2005, IEEE Internet Computing.

[7]  Mengxuan Song,et al.  Thermal-Aware Energy Management of an HPC Data Center via Two-Time-Scale Control , 2017, IEEE Transactions on Industrial Informatics.

[8]  Haitao Yuan,et al.  WARM: Workload-Aware Multi-Application Task Scheduling for Revenue Maximization in SDN-Based Cloud Data Center , 2018, IEEE Access.

[9]  Damián Fernández-Cerero,et al.  Productive Efficiency of Energy-Aware Data Centers , 2018, Energies.

[10]  Eric Rondeau,et al.  Energy-Oriented Analysis of HPC Cluster Queues: Emerging Metrics for Sustainable Data Center , 2018, Lecture Notes in Electrical Engineering.

[11]  Miguel Toro,et al.  SCORE: Simulator for cloud optimization of resources and energy consumption , 2018, Simul. Model. Pract. Theory.

[12]  Csaba Farkas,et al.  Energy efficient data centre infrastructure—Development of a power loss model , 2016 .

[13]  Damián Fernández-Cerero,et al.  Energy policies for data-center monolithic schedulers , 2018, Expert Syst. Appl..

[14]  Chayan Nadjahi,et al.  A review of thermal management and innovative cooling strategies for data center , 2018, Sustain. Comput. Informatics Syst..

[15]  Anastasiia Grishina,et al.  Data center energy efficiency assessment based on real data analysis , 2019 .

[16]  Nor Badrul Anuar,et al.  The role of big data in smart city , 2016, Int. J. Inf. Manag..

[17]  MengChu Zhou,et al.  Application-Aware Dynamic Fine-Grained Resource Provisioning in a Virtualized Cloud Data Center , 2017, IEEE Transactions on Automation Science and Engineering.

[18]  Xiaofeng Niu,et al.  Recent advancements on thermal management and evaluation for data centers , 2018, Applied Thermal Engineering.

[19]  Marco Perino,et al.  Review on Performance Metrics for Energy Efficiency in Data Center: The Role of Thermal Management , 2014, E2DC.

[20]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[21]  L. Nilsson,et al.  Data centres in future European energy systems—energy efficiency, integration and policy , 2019, Energy Efficiency.

[22]  Marta Chinnici,et al.  Measuring energy efficiency in data centers , 2016 .

[23]  Zhihua Wang,et al.  Real time thermal management controller for data center , 2014, Fourteenth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).

[24]  M.K. Patterson,et al.  The effect of data center temperature on energy efficiency , 2008, 2008 11th Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems.

[25]  Marta Chinnici,et al.  Thermal Metrics for Data Centers: A Critical Review☆ , 2014 .

[26]  Jean-Philippe Georges,et al.  Benefit-cost model for comparing data center performance from a biomimicry perspective , 2019, Journal of Cleaner Production.

[27]  Damián Fernández-Cerero,et al.  Bullfighting extreme scenarios in efficient hyper-scale cluster computing , 2020, Cluster Computing.

[28]  Douglas G. Down,et al.  Joint data center cooling and workload management: A thermal-aware approach , 2020, Future Gener. Comput. Syst..

[29]  Karl Andersson,et al.  An international Master's program in green ICT as a contribution to sustainable development , 2016 .

[30]  Y. Joshi,et al.  Comparison of data driven modeling approaches for temperature prediction in data centers , 2019, International Journal of Heat and Mass Transfer.

[31]  Jonas Gustafsson,et al.  Integrated thermal management of a 150kW pilot Open Compute Project style data center , 2019, 2019 IEEE 17th International Conference on Industrial Informatics (INDIN).

[32]  Eric Rondeau,et al.  DC Energy Data Measurement and Analysis for Productivity and Waste Energy Assessment , 2018, 2018 IEEE International Conference on Computational Science and Engineering (CSE).

[33]  Christodoulos A. Floudas,et al.  Determining the Optimal Number of Clusters , 2009, Encyclopedia of Optimization.