ALTM: Adaptive learning-based thermal model for temperature predictions in data centers

To design effective control schemes for energy efficiency in data centers, it is crucial to have a thermal model of the system. Constructing thermal models of data centers for temperature prediction is extremely challenging, due to inherent complexity. Computational fluid dynamics (CFD) simulations or physical heat transfer equations are conventionally used to construct such thermal models. More recent approaches combine physical heat transfer rules and data-driven methods in an effort to obtain more accurate models. Our proposed adaptive learning-based thermal model (ALTM) is fast, adapts to thermal changes in the data center environment, and does not require prior knowledge of heat transfer rules between data center entities. Unlike other methods, ALTM is a holistic thermal model that predicts temperature of critical zones using data center operational variables as inputs. The operational variables are the controllable parameters and easily obtained measurements from IT and cooling units. A key use case for ALTM is that it can be effectively used for thermal-aware workload schedulers or cooling system controllers. Our results confirm the accuracy and adaptability of the model.

[1]  Douglas G. Down,et al.  EAWA: Energy-Aware Workload Assignment in Data Centers , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[2]  Daniel R. Jeske,et al.  Reliability Modeling of Hardware and Software Interactions, and Its Applications , 2006, IEEE Transactions on Reliability.

[3]  Xue Liu,et al.  Adaptive Power Management through Thermal Aware Workload Balancing in Internet Data Centers , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Gerard F. Jones,et al.  A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities , 2014 .

[5]  Seda Ogrenci Memik,et al.  Minimizing Thermal Variation Across System Components , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[6]  Sandeep K. S. Gupta,et al.  Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach , 2008, IEEE Transactions on Parallel and Distributed Systems.

[7]  Cullen E. Bash,et al.  Computational Fluid Dynamics Modeling of High Compute Density Data Centers to Assure System Inlet Air Specifications , 2001 .

[8]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[9]  Sandeep K. S. Gupta,et al.  TACOMA: Server and workload management in internet data centers considering cooling-computing power trade-off and energy proportionality , 2012, TACO.

[10]  Yixin Chen,et al.  Intelligent Sensor Placement for Hot Server Detection in Data Centers , 2013, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[12]  Gokhan Memik,et al.  Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components , 2018, IEEE Transactions on Parallel and Distributed Systems.

[13]  Hosein Moazamigoodarzi,et al.  Real-time temperature predictions in IT server enclosures , 2018, International Journal of Heat and Mass Transfer.

[14]  Suman Nath,et al.  ThermoCast: a cyber-physical forecasting model for datacenters , 2011, KDD.

[15]  Alfonso Capozzoli,et al.  Cooling Systems in Data Centers: State of Art and Emerging Technologies , 2015 .