A fast noise resilient anomaly detection using GMM-based collective labelling

Anomaly detection algorithms face several challenges including computational complexity and resiliency to noise in input data. In this paper, we propose a fast and noise-resilient cluster-based anomaly detection method using collective labelling approach. In the proposed Collective Probabilistic Anomaly Detection method, first instead of labelling each new sample (as normal or anomaly) individually, the new samples are clustered, then labelled. This collective labelling mitigates the negative impact of noise by relying on group behaviour rather than individual characteristics of incoming samples. Second, since grouping and labelling new samples may be time-consuming, we summarize clusters using Gaussian Mixture Model (GMM). Not only does GMM offer faster processing speed; it also facilitates summarizing clusters with arbitrary shape, and consequently, reducing the memory space requirement. Finally, a modified distance measure, based on Kullback-Liebner method, is proposed to calculate the similarity among clusters represented by GMMs. We evaluate the proposed method on various datasets by measuring its false alarm rate, detection rate and memory requirement. We also add different levels of noise to the input datasets to demonstrate the performance of the proposed collective anomaly detection method in the presence of noise. The experimental results confirm superior performance of the proposed method compared to individually-based labelling techniques in terms of memory usage, detection rate and false alarm rate.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[3]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[4]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[5]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[9]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[11]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[12]  Hassan Hajji,et al.  Statistical analysis of network traffic for adaptive faults detection , 2005, IEEE Transactions on Neural Networks.

[13]  Reda Alhajj,et al.  A comprehensive survey of numeric and symbolic outlier mining techniques , 2006, Intell. Data Anal..

[14]  S. Sathiya Keerthi,et al.  Building Support Vector Machines with Reduced Classifier Complexity , 2006, J. Mach. Learn. Res..

[15]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[16]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Vir V. Phoha,et al.  K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Mingwei Leng,et al.  Time series representation for anomaly detection , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[21]  Christopher Leckie,et al.  A survey of coordinated attacks and collaborative intrusion detection , 2010, Comput. Secur..

[22]  Srinivasan Parthasarathy,et al.  Distance-based outlier detection , 2010, Proc. VLDB Endow..

[23]  Urbashi Mitra,et al.  Parametric Methods for Anomaly Detection in Aggregate Traffic , 2011, IEEE/ACM Transactions on Networking.

[24]  Hsiao-Hwa Chen,et al.  Scalable Hyper-Grid k-NN-based Online Anomaly Detection in Wireless Sensor Networks , 2012 .

[25]  Jiankun Hu,et al.  Scalable Hypergrid k-NN-Based Online Anomaly Detection in Wireless Sensor Networks , 2013, IEEE Transactions on Parallel and Distributed Systems.

[26]  Venkatesh Saligrama,et al.  A new one-class SVM for anomaly detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Slim Abdennadher,et al.  Enhancing one-class support vector machines for unsupervised anomaly detection , 2013, ODD '13.

[28]  Hassan Asgharian,et al.  A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks , 2013, Evolutionary Intelligence.

[29]  Ahmad Akbari,et al.  A Noise Resilient and Non-parametric Graph-based Classifier , 2014, KDIR.