Model-Based Thermal Anomaly Detection in Cloud Datacenters Using Thermal Imaging

The growing importance, large scale, and high server density of high-performance computing datacenters make them prone to attacks, misconfigurations, and failures (of the cooling as well as of the computing infrastructure). Such unexpected events often lead to thermal anomalies – hotspots, fugues, and coldspots – which impact the cost of operation of datacenters. A model-based thermal anomaly detection mechanism, which compares expected (obtained using heat-generation and -extraction models) and observed thermal maps (obtained using thermal cameras) of datacenters, is proposed. In addition, a novel Thermal Anomaly-aware Resource Allocation (TARA) is designed to induce a time-varying thermal fingerprint (thermal map) of the datacenter so to maximize the detection accuracy of the anomalies. As shown via experiments on a small-scale testbed as well as via trace-driven simulations, such model-based thermal anomaly detection solution in conjunction with TARA significantly improves the detection probability compared to anomaly detection when scheduling algorithms such as random, round robin, and best-fit-decreasing are employed.

[1]  Roger R. Schmidt MEASUREMENTS AND PREDICTIONS OF THE FLOW DISTRIBUTION THROUGH PERFORATED TILES IN RAISED-FLOOR DATA CENTERS , 2001 .

[2]  Umesh Bellur,et al.  Resource availability based performance benchmarking of virtual machine migrations , 2013, ICPE '13.

[3]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[4]  Bernhard Plattner,et al.  Network anomaly detection in the cloud: The challenges of virtual service migration , 2014, 2014 IEEE International Conference on Communications (ICC).

[5]  Jaehyuk Huh,et al.  Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources , 2012, HotCloud.

[6]  Jeffrey S. Chase,et al.  Weatherman: Automated, Online and Predictive Thermal Mapping and Management for Data Centers , 2006, 2006 IEEE International Conference on Autonomic Computing.

[7]  Manish Parashar,et al.  Energy-efficient application-aware online provisioning for virtualized clouds and data centers , 2010, International Conference on Green Computing.

[8]  George Forman,et al.  Cool Job Allocation: Measuring the Power Savings of Placing Jobs at Cooling-Efficient Locations in the Data Center , 2007, USENIX Annual Technical Conference.

[9]  Hamid Noori,et al.  Proactive task migration with a self-adjusting migration threshold for dynamic thermal management of multi-core processors , 2014, The Journal of Supercomputing.

[10]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[11]  Renato J. O. Figueiredo,et al.  Experimental Study of Virtual Machine Migration in Support of Reservation of Cluster Resources , 2007, Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing (VTDC '07).

[12]  Dario Pompili,et al.  Proactive thermal management in green datacenters , 2012, The Journal of Supercomputing.

[13]  Patrick Martin,et al.  IDSaaS: Intrusion Detection System as a Service in Public Clouds , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[14]  Dario Pompili,et al.  Self-organizing sensing infrastructure for autonomic management of green datacenters , 2011, IEEE Network.

[15]  Rongliang Zhou,et al.  Failure Resistant Data Center Cooling Control Through Model-Based Thermal Zone Mapping , 2012 .

[16]  Dario Pompili,et al.  VMAP: Proactive thermal-aware virtual machine allocation in HPC cloud datacenters , 2012, 2012 19th International Conference on High Performance Computing.

[17]  Wu-chun Feng,et al.  Making a case for a Green500 list , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[18]  Sandeep K. S. Gupta,et al.  Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach , 2008, IEEE Transactions on Parallel and Distributed Systems.

[19]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[20]  Manish Marwah,et al.  Thermal anomaly prediction in data centers , 2010, 2010 12th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems.

[21]  Gail E. Kaiser,et al.  Multi-perspective evaluation of self-healing systems using simple probabilistic models , 2009, ICAC '09.

[22]  Jie Wu,et al.  Migration-based virtual machine placement in cloud systems , 2013, 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet).

[23]  Jacob D. Furst,et al.  CO-OCCURRENCE MATRICES FOR VOLUMETRIC DATA , 2004 .

[24]  Kishor S. Trivedi,et al.  A comparative experimental study of software rejuvenation overhead , 2013, Perform. Evaluation.

[25]  Roberto Bifulco,et al.  Integrating a network IDS into an open source Cloud Computing environment , 2010, 2010 Sixth International Conference on Information Assurance and Security.

[26]  Jun Yan,et al.  A Network-aware Virtual Machine Placement and Migration Approach in Cloud Computing , 2010, 2010 Ninth International Conference on Grid and Cloud Computing.

[27]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[28]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.