Toward Automated Calibration of Data Center Digital Twins: A Neural Surrogate Approach

Evolving the computational fluid dynamics (CFD) model to high fidelity digital twin is desirable for industrial data center management. However, existing CFD model calibration approaches to improve the model accuracy require either excessive manual tuning or intensive computation, rendering them non-scalable with system size and complexity. This paper presents a surrogate-based approach to automate the calibration of CFD models built for industrial data centers. Specifically, a knowledge-based graph neural net (GNN) is trained to approximate a CFD model as a surrogate model that captures the key thermal variables and their causal relationships in a given data hall. By integrating prior knowledge as constraints, the GNN has reduced demand on the amount of training data. After rounds of the training processes, the neural surrogate can recommend the optimal configurations for the CFD model parameters that are hard to obtain, such that the temperatures predicted by the CFD are most consistent with the actual measurements. Experiments of applying the proposed approach to calibrate two CFD models built for two production data halls hosting thousands of servers achieve temperature prediction errors of $0.81^\circ$C and $0.75^\circ$C with about $30$ hours of computation on a quad-core virtual machine in the cloud.

[1]  Neda Mohammadi,et al.  Smart city digital twins , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[2]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[3]  Umesh Singh,et al.  CFD-Based Operational Thermal Efficiency Improvement of a Production Data Center , 2010, SustainIT.

[4]  Sandeep K. S. Gupta,et al.  A transient model for data center thermal prediction , 2012, 2012 International Green Computing Conference (IGCC).

[5]  Weicong Na,et al.  A Unified Automated Parametric Modeling Algorithm Using Knowledge-Based Neural Network and ${l}_{1}$ Optimization , 2017, IEEE Transactions on Microwave Theory and Techniques.

[6]  Suman Nath,et al.  ThermoCast: a cyber-physical forecasting model for datacenters , 2011, KDD.

[7]  Fei Tao,et al.  Digital Twin and Big Data Towards Smart Manufacturing and Industry 4.0: 360 Degree Comparison , 2018, IEEE Access.

[8]  K. Karki,et al.  CFD modeling of an existing raised-floor data center , 2013, 29th IEEE Semiconductor Thermal Measurement and Management Symposium.

[9]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[10]  Long Phan,et al.  CFD-based response surface methodology for rapid thermal simulation and optimal design of data centers , 2020 .

[11]  Zhong-Hua Han,et al.  Efficient aerodynamic shape optimization using variable-fidelity surrogate models and multilevel computational grids , 2020, Chinese Journal of Aeronautics.

[12]  Montri Wiboonrat Data center infrastructure management WLAN networks for monitoring and controlling systems , 2014, The International Conference on Information Networking 2014 (ICOIN2014).

[13]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[14]  Guoliang Xing,et al.  A High-Fidelity Temperature Distribution Forecasting System for Data Centers , 2012, 2012 IEEE 33rd Real-Time Systems Symposium.

[15]  Abdulmotaleb El Saddik,et al.  C2PS: A Digital Twin Architecture Reference Model for the Cloud-Based Cyber-Physical Systems , 2017, IEEE Access.

[16]  Xin Zhou,et al.  Toward Efficient Compute-Intensive Job Allocation for Green Data Centers: A Deep Reinforcement Learning Approach , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[17]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[18]  Stefano Ermon,et al.  Label-Free Supervision of Neural Networks with Physics and Domain Knowledge , 2016, AAAI.

[19]  Christian Igel,et al.  A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies , 2006, GECCO.

[20]  Sankaran Mahadevan,et al.  Error Quantification and Confidence Assessment of Aerothermal Model Predictions for Hypersonic Aircraft (Preprint) , 2012 .

[21]  Anthony J. Jakeman,et al.  A review of surrogate models and their application to groundwater modeling , 2015 .

[22]  Darren J. Hartl,et al.  Computationally Efficient Analysis of SMA Sensory Particles Embedded in Complex Aerostructures Using a Substructure Approach , 2015 .

[23]  K. C. Gupta,et al.  Design and optimization of CPW circuits using EM-ANN models for CPW components , 1997 .

[24]  S. H. Chen,et al.  Electromagnetic optimization exploiting aggressive space mapping , 1995 .

[25]  S. Michael Spottswood,et al.  Reengineering Aircraft Structural Life Prediction Using a Digital Twin , 2011 .

[26]  P. Roache QUANTIFICATION OF UNCERTAINTY IN COMPUTATIONAL FLUID DYNAMICS , 1997 .

[27]  Dan Boneh,et al.  On genetic algorithms , 1995, COLT '95.

[28]  Leifur Þ. Leifsson,et al.  Surrogate-Based Aerodynamic Shape Optimization by Variable-Resolution Models , 2013 .

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[31]  Tao Chen,et al.  Back propagation neural network with adaptive differential evolution algorithm for time series forecasting , 2015, Expert Syst. Appl..

[32]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[33]  Shreshth Nagpal,et al.  A methodology for auto-calibrating urban building energy models using surrogate modeling techniques , 2019 .

[34]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[35]  Xin Zhou,et al.  DeepEE: Joint Optimization of Job Scheduling and Cooling Control for Data Center Energy Efficiency Using Deep Reinforcement Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[36]  John W. Bandler,et al.  Space mapping technique for electromagnetic optimization , 1994 .

[37]  Craig Boutilier,et al.  Data center cooling using model-predictive control , 2018, NeurIPS.