Towards a Non-intrusive Recognition of Anomalous System Behavior in Data Centers

In this paper we propose a monitoring system of a data center that is able to infer when the data center is getting into an anomalous behavior by analyzing the power consumption at each server and the data center network traffic. The monitoring system is non-intrusive in the sense that there is no need to install software on the data center servers. The monitoring architecture embeds two Elman Recurrent Networks (RNNs) to predict power consumed by each data center component starting from data center network traffic and viceversa. Results obtained along six mounts of experiments, within a data center, show that the architecture is able to classify anomalous system behaviors and normal ones by analyzing the error between the actual values of power consumption and network traffic and the ones inferred by the two RNNs.

[1]  Sujata Banerjee,et al.  ElasticTree: Saving Energy in Data Center Networks , 2010, NSDI.

[2]  Roberto Baldoni,et al.  Correlating power consumption and network traffic for improving data centers resiliency , 2014, ArXiv.

[3]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[4]  Jean Arlat,et al.  Fault injection for formal testing of fault tolerance , 1996, IEEE Trans. Reliab..

[5]  Roberto Baldoni,et al.  Online Black-Box Failure Prediction for Mission Critical Distributed Systems , 2012, SAFECOMP.

[6]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[7]  Priya Narasimhan,et al.  Tiresias: Black-Box Failure Prediction in Distributed Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  Neil Davey,et al.  Time Series Prediction and Neural Networks , 2001, J. Intell. Robotic Syst..

[9]  Henrique Madeira,et al.  Emulation of Software Faults: A Field Data Study and a Practical Approach , 2006, IEEE Transactions on Software Engineering.

[10]  Xiaorui Wang,et al.  Power capping: a prelude to power shifting , 2008, Cluster Computing.

[11]  Robert J. Marks,et al.  Electric load forecasting using an artificial neural network , 1991 .

[12]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[13]  Yefu Wang,et al.  Co-Con: Coordinated control of power and application performance for virtualized server clusters , 2009, 2009 17th International Workshop on Quality of Service.

[14]  Domenico Cotroneo,et al.  Representativeness analysis of injected software faults in complex software , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[15]  Xiaorui Wang,et al.  Cluster-level feedback power control for performance optimization , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[16]  Haixun Wang,et al.  Adaptive system anomaly prediction for large-scale hosting infrastructures , 2010, PODC.

[17]  T. Funabashi,et al.  One-Hour-Ahead Load Forecasting Using Neural Networks , 2002 .

[18]  Philip Levis,et al.  Energy Dumpster Diving , 2009 .

[19]  Seyed Saeed Madani,et al.  Electric Load Forecasting Using an Artificial Neural Network , 2013 .

[20]  Farnam Jahanian,et al.  Testing of fault-tolerant and real-time distributed systems via protocol fault injection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[21]  Xiaodong Wang,et al.  CARPO: Correlation-aware power optimization in data center networks , 2012, 2012 Proceedings IEEE INFOCOM.

[22]  Roberto Baldoni,et al.  An Architecture for Automatic Scaling of Replicated Services , 2014, NETYS.