Sustainable operation and management of data center chillers using temporal data mining

Motivation: Data centers are a critical component of modern IT infrastructure but are also among the worst environmental offenders through their increasing energy usage and the resulting large carbon footprints. Efficient management of data centers, including power management, networking, and cooling infrastructure, is hence crucial to sustainability. In the absence of a 'first-principles' approach to manage these complex components and their interactions, data-driven approaches have become attractive and tenable. Results: We present a temporal data mining solution to model and optimize performance of data center chillers, a key component of the cooling infrastructure. It helps bridge raw, numeric, time-series information from sensor streams toward higher level characterizations of chiller behavior, suitable for a data center engineer. To aid in this transduction, temporal data streams are first encoded into a symbolic representation, next run-length encoded segments are mined to form frequent motifs in time series, and finally these metrics are evaluated by their contributions to sustainability. A key innovation in our application is the ability to intersperse "don't care" transitions (e.g., transients) in continuous-valued time series data, an advantage we inherit by the application of frequent episode mining to symbolized representations of numeric time series. Our approach provides both qualitative and quantitative characterizations of the sensor streams to the data center engineer, to aid him in tuning chiller operating characteristics. This system is currently being prototyped for a data center managed by HP and experimental results from this application reveal the promise of our approach.

[1]  Jimeng Sun,et al.  InteMon: continuous mining of sensor data in large-scale self-infrastructures , 2006, OPSR.

[2]  Chandrakant D. Patel,et al.  Application of Exploratory Data Analysis (EDA) Techniques to Temperature Data in a Conventional Data Center , 2007 .

[3]  P. S. Sastry,et al.  Discovering frequent episodes and learning hidden Markov models: a formal connection , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[5]  Debprakash Patnaik,et al.  Inferring neuronal network connectivity from spike data: A temporal data mining approach , 2008, Sci. Program..

[6]  Chandrakant D. Patel,et al.  On building next generation data centers: energy flow in the information technology stack , 2008, Bangalore Compute Conf..

[7]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[8]  Eamonn J. Keogh,et al.  Mining motifs in massive time series databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Jessica Lin,et al.  Finding Motifs in Time Series , 2002, KDD 2002.

[10]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[11]  Eamonn J. Keogh,et al.  Detecting time series motifs under uniform scaling , 2007, KDD '07.

[12]  Lola Bautista,et al.  Analysis of Environmental Data in Data Centers , 2007 .

[13]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[14]  Jimeng Sun,et al.  InteMon: intelligent system monitoring on large clusters , 2006, VLDB.

[15]  Eamonn J. Keogh,et al.  Probabilistic discovery of time series motifs , 2003, KDD '03.

[16]  Cullen E. Bash,et al.  Viability of Dynamic Cooling Control in a Data Center Environment , 2006 .

[17]  C. Bash,et al.  Exergy Analysis of Data Center Thermal Management Systems , 2008 .

[18]  Tamara Munzner,et al.  LiveRAC: interactive visual exploration of system management time-series data , 2008, CHI.

[19]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[20]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[21]  Hui Xiong,et al.  Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.

[22]  Ryen W. White,et al.  Stream prediction using a generative model based on frequent episodes in event sequences , 2008, KDD.

[23]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[24]  John F. Roddick,et al.  Higher order mining , 2008, SKDD.