Self-Organizing maps for detecting abnormal thermal behavior in data centers

The increasing success of Cloud Computing applications and online services has contributed to the unsustainability of data center facilities in terms of energy consumption. Higher resource demand has increased the electricity required by computation and cooling resources, leading to power shortages and outages, specially in urban infrastructures. Current energy reduction strategies for Cloud facilities usually disregard the data center topology, the contribution of cooling consumption and the scalability of optimization strategies. Our work tackles the energy challenge by proposing a temperature-aware {VM} allocation policy based on a {Trust-and-Reputation} System ({TRS}). A {TRS} meets the requirements for inherently distributed environments such as data centers, and allows the implementation of autonomous and scalable {VM} allocation techniques. For this purpose, we model the relationships between the different computational entities, synthesizing this information in one single metric. This metric, called reputation, would be used to optimize the allocation of {VMs} in order to reduce energy consumption. We validate our approach with a state-of-the-art Cloud simulator using real Cloud traces. Our results show considerable reduction in energy consumption, reaching up to 46.16\% savings in computing power and 17.38\% savings in cooling, without {QoS} degradation while keeping servers below thermal redlining. Moreover, our results show the limitations of the {PUE} ratio as a metric for energy efficiency. To the best of our knowledge, this paper is the first approach in combining {Trust-and-Reputation} systems with Cloud Computing {VM} allocation.

[1]  Ang Li,et al.  Fast Anomaly Detection for Large Data Centers , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[2]  Luca Benini,et al.  Reliability-aware design for nanometer-scale devices , 2008, 2008 Asia and South Pacific Design Automation Conference.

[3]  Vanish Talwar,et al.  Ranking anomalies in data centers , 2012, 2012 IEEE Network Operations and Management Symposium.

[4]  Shawn Ostermann,et al.  Detecting Anomalous Network Traffic with Self-organizing Maps , 2003, RAID.

[5]  Roberto Baldoni,et al.  Towards a Non-intrusive Recognition of Anomalous System Behavior in Data Centers , 2014, SAFECOMP Workshops.

[6]  Khaled Labib,et al.  NSOM: A Real-Time Network-Based Intrusion Detection System Using Self-Organizing Maps , 2002 .

[7]  Wanli Min,et al.  Journal of the American Statistical Association a Statistical Approach to Thermal Management of Data Centers under Steady State and System Perturbations a Statistical Approach to Thermal Management of Data Centers under Steady State and System Perturbations , 2022 .

[8]  Dario Pompili,et al.  Thermal anomaly detection in datacenters , 2012 .

[9]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[10]  Dario Pompili,et al.  Model-Based Thermal Anomaly Detection in Cloud Datacenters , 2013, 2013 IEEE International Conference on Distributed Computing in Sensor Systems.

[11]  Maisarah Ali,et al.  Optimization of cooling systems in data centre by Computational Fluid Dynamics model and simulation , 2009, 2009 Innovative Technologies in Intelligent Systems and Industrial Applications.

[12]  M. Marwah,et al.  Anomalous Thermal Behavior Detection in Data Centers using Hierarchical PCA , 2010 .

[13]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .