BladeCenter thermal diagnostics

An analytical technique called thermal diagnostics is presented as a tool for determining the root cause of thermal anomalies arising in electronic equipment. The technique utilizes a dynamically constructed flow network model, real-time inventory, temperature, utilization metrics, and statistical hypothesis testing to select the most likely scenario from among thousands of potential causes of thermal problems. This paper describes the concept of thermal diagnostics and concludes with results from a laboratory evaluation in which we physically trigger thermal anomalies on a running IBM eServerTM BladeCenter® system and record the diagnosis given by the algorithm. In these tests, our algorithm correctly diagnosed the thermal situation and provided meaningful guidance toward clearing the detected problems.

[1]  Tushar D. Fadale,et al.  BladeCenter systems management software , 2005, IBM J. Res. Dev..

[2]  Angela Minichiello Flow network modeling: a case study in expedient system prototyping , 2000, ITHERM 2000. The Seventh Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (Cat. No.00CH37069).

[3]  Michael S. Miller,et al.  BladeCenter packaging, power, and cooling , 2005, IBM J. Res. Dev..

[4]  Kishor S. Trivedi,et al.  Proactive management of software aging , 2001, IBM J. Res. Dev..

[5]  Richard E. Harper,et al.  Workload-based power management for parallel computer systems , 2003, IBM J. Res. Dev..

[6]  Thomas M. Bradicich,et al.  BladeCenter system overview , 2005, IBM J. Res. Dev..