Empirical analysis of blower cooling failure in containment: Effects on IT performance

Data Centers are prone to power outages and cooling failures. During such events, complex transport interactions take place between the cooling system and the IT. Empirical data on this phenomenon is scarce in the current literature due to the complexity and size of such experiments. In this study, a facility level data center blowers cooling failure experiment is run and analyzed. Quantitative instrumentation includes pressure differentials, tile airflow, point air inlet temperature, contours air inlet temperature and IT IPMI data during failure-recovery. Qualitative measurements include IR imaging and airflow visualization via smoke trace. To our knowledge, this is the first experimental study in literature in which an actual multi aisle facility cooling failure is run with real IT (compute, Network and storage) load in the white space. This will enable a link between variations from the facility to the chip levels. Results show that by using external air inlet temperature sensors the containment configuration has a longer uptime during failure. However, the IPMI data shows the opposite. In fact, the RTT is reduced by ~70% when the external and internal sensors are compared. This occurs due external impedances formed by the containment during failure degrading IT airflow systems. The inconsistency between IT IPMI inlet sensors and externally placed IT or rack inlet sensors (based on best practices) are expected to increase as the airflow imbalances increase.

[1]  Saurabh K. Shrivastava,et al.  Benefits of properly sealing a cold aisle containment system , 2014, Fourteenth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).

[2]  Kourosh Nemati,et al.  Management and predictions of operational changes and growth in mission critical facilities , 2016, 2016 32nd Thermal Measurement, Modeling & Management Symposium (SEMI-THERM).

[3]  Bahgat Sammakia,et al.  Experimental and Numerical Characterization of a Raised Floor Data Center Using Rapid Operational Flow Curves Model , 2015 .

[4]  Roger R. Schmidt,et al.  Chip to Facility Ramifications of Containment Solution on IT Airflow and Uptime , 2016, IEEE Transactions on Components, Packaging and Manufacturing Technology.

[5]  Bahgat Sammakia,et al.  Ranking and Optimization of CAC and HAC Leakage Using Pressure Controlled Models , 2015 .

[6]  Jay L. Vincent,et al.  Using platform level telemetry to reduce power consumption in a datacenter , 2015, 2015 31st Thermal Measurement, Modeling & Management Symposium (SEMI-THERM).

[7]  Saurabh K. Shrivastava,et al.  Benefit of Cold Aisle Containment During Cooling Failure , 2013 .

[8]  B. Sammakia,et al.  Innovative approaches of experimentally guided CFD modeling for data centers , 2015, 2015 31st Thermal Measurement, Modeling & Management Symposium (SEMI-THERM).

[9]  Kanad Ghose,et al.  Experimental characterization of a Rear Door Heat exchanger with localized containment , 2016, 2016 15th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).