An Efficient Method for Anomaly Detection in Non-Stationary Data Streams

Anomaly detection in data streams has become a major research problem in the era of ubiquitous sensing. We are collecting large amounts of data from non-stationary environments, which makes traditional anomaly detection techniques ineffective. In this paper we propose an unsupervised cluster-based algorithm for modelling normal behaviour in non-stationary data streams and detecting anomalous data points. We show that our method scales linearly with the number of observed data points, while the complexity of our model is independent of the size of the data stream. We have employed a selective clustering approach to optimize the computation time needed to model the normal data. Our experiments on large-scale synthetic and real life datasets show that the accuracy of the proposed algorithm is comparable to the state-of-the-art techniques reported in the literature while providing substantial improvements in terms of computation time.

[1]  Mahsa Salehi,et al.  A Relevance Weighted Ensemble Model for Anomaly Detection in Switching Data Streams , 2014, PAKDD.

[2]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[3]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[5]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[6]  Marimuthu Palaniswami,et al.  Elliptical anomalies in wireless sensor networks , 2009, TOSN.

[7]  W. Härdle,et al.  Applied Multivariate Statistical Analysis , 2003 .

[8]  Sutharshan Rajasegarar,et al.  Anomaly detection by clustering ellipsoids in wireless sensor networks , 2009, 2009 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[9]  D. Henderson,et al.  Experiencing Geometry: On Plane and Sphere , 1995 .

[10]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[11]  Marimuthu Palaniswami,et al.  Clustering ellipses for anomaly detection , 2011, Pattern Recognit..

[12]  Christopher Leckie,et al.  An efficient hyperellipsoidal clustering algorithm for resource-constrained environments , 2011, Pattern Recognit..

[13]  Fabrizio Angiulli,et al.  Distance-based outlier queries in data streams: the novel task and algorithms , 2010, Data Mining and Knowledge Discovery.

[14]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .