Using Machine Learning for Data Center Cooling Infrastructure Efficiency Prediction

Power consumption continues to remain a critical aspect for High Performance Computing (HPC) data centers. It becomes even more crucial for Exascale computing since scaling today's fastest system to an Exaflop level would consume more than 168 MW power which is 8 times higher than the 20 MW power consumption goal set, at the time of this publication, by the US Department of Energy. This naturally leads to a necessity for energy efficiency improvement that will encompass the full chain of the power consumers, starting from the data center infrastructure, including cooling overheads and electrical losses, up to compute resource scheduling and application scaling. In this paper a machine learning approach is proposed to model the Coefficient of Performance (COP) of HPC data center's hot water cooling loop. The suggested model is validated on operational data obtained at Leibniz Supercomputing Centre (LRZ). The paper shows how this COP model can help to improve the energy efficiency of modern HPC data centers.

[1]  Yurii A. Vlasov,et al.  Technologies for exascale systems , 2011, IBM J. Res. Dev..

[2]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[3]  Sanjeev Jain,et al.  Transient simulation of wet cooling strategies for a data center in worldwide climate zones , 2016 .

[4]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[5]  Hayk Shoukourian,et al.  Adviser for Energy Consumption Management: Green Energy Conservation , 2015 .

[6]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[7]  Torsten Wilde,et al.  Predicting energy consumption relevant indicators of strong scaling HPC applications for different compute resource configurations , 2015, SpringSim.

[8]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[9]  Suzana de Siqueira Santos,et al.  A comparative study of statistical methods used to identify dependencies between gene expression signals , 2014, Briefings Bioinform..

[10]  Richard E. Brown,et al.  United States Data Center Energy Usage Report , 2016 .

[11]  Torsten Wilde,et al.  The 4 Pillar Framework for energy efficient HPC data centers , 2013, Computer Science - Research and Development.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Christian Belady,et al.  GREEN GRID DATA CENTER POWER EFFICIENCY METRICS: PUE AND DCIE , 2008 .

[14]  Torsten Wilde,et al.  Monitoring Power Data: A first step towards a unified energy efficiency evaluation toolset for HPC data centers , 2014, Environ. Model. Softw..

[15]  Gregory A. Koenig,et al.  The Electrical Grid and Supercomputing Centers: An Investigative Analysis of Emerging Opportunities and Challenges , 2015 .

[16]  Roland B. Stull,et al.  Wet-Bulb Temperature from Relative Humidity and Air Temperature , 2011 .

[17]  Jim Gao,et al.  Machine Learning Applications for Data Center Optimization , 2014 .

[18]  Torsten Wilde,et al.  Increasing Data Center Energy Efficiency via Simulation and Optimization of Cooling Circuits - A Practical Approach , 2015, D-A-CH EI.

[19]  Danilo P. Mandic,et al.  Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability , 2001 .

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.