ConSil: Low-Cost Thermal Mapping of Data Centers

Several projects involving high-level thermal management — such as eliminating “hot spots” or reducing cooling costs through intelligent workload placement — require ambient air temperature readings at a fine granularity. Unfortunately, current thermal instrumentation methods involve installing a set of expensive hardware sensors. Modern motherboards include multiple on-board sensors, but the values reported by these sensors are dominated by the thermal effects of the server’s workload. We propose using machine learning methods to model the effects of server workload on on-board sensors. Our models combine on-board sensor readings with workload instrumentation and “mask out” the thermal effects due to workload, leaving us with the ambient air temperature at that server’s inlet. We present a formal problem statement, outline the properties of our model, describe the machine learning approach we use to construct our models, and present ConSil, a prototype implementation.