A Variable Markovian Based Outlier Detection Method for Multi-Dimensional Sequence over Data Stream

Nowadays sequence data tends to be multi-dimensional sequence over data stream, it has a large state space and arrives at unprecedented speed. It is a big challenge to design a multi-dimensional sequence outlier detection method to meet the accurate and high speed requirements. The traditional methods can't handle multi-dimensional sequence effectively as they have poor abilities for multi-dimensional sequence modeling, and can't detect outlier timely as they have high computational complexity. In this paper we propose a variable Markovian based outlier detection method for multi-dimensional sequence over data stream, VMOD, which consists of two algorithms: mutual information based feature selection algorithm (MIFS), variable Markovian based sequential analysis algorithm (VMSA). It uses MIFS algorithm to reduce the state space and redundant features, and uses VMSA algorithm to accelerate the outlier detection. Through VMOD method, we can improve the detection rate and detection speed. The MIFS algorithm uses mutual information as similarity measures and adopt clustering based strategy to select features, it can improve the abilities for sequence modeling through reducing the state space and redundant features, consequently, to improve the detection rate. The VMSA algorithm use random sample and index structure to accelerate the variable Markovian model construction and reduce the model complexity, consequently, to quicken the outlier detection. The experiments show that VMOD can detect outlier effectively, and reduce the detection time by at least 50% compared with the traditional methods.

[1]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Mohamed Nadif,et al.  Unsupervised anomaly detection for Aircraft Condition Monitoring System , 2015, 2015 IEEE Aerospace Conference.

[3]  Barbara G. Ryder,et al.  A Formal Framework for Program Anomaly Detection , 2015, RAID.

[4]  Bhavani M. Thuraisingham,et al.  Evolving Insider Threat Detection Stream Mining Perspective , 2013, Int. J. Artif. Intell. Tools.

[5]  Xiaoling Li,et al.  Parallelizing skyline queries over uncertain data streams with sliding window partitioning and grid index , 2014, Knowledge and Information Systems.

[6]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[7]  A. N. Srivastava,et al.  Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences , 2006 .

[8]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[9]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[10]  Salvatore J. Stolfo,et al.  Learning Rules from System Call Arguments and Sequences for Anomaly 20 Detection , 2003 .

[11]  Joshua Zhexue Huang,et al.  A New Markov Model for Clustering Categorical Sequences , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Quan Qian,et al.  Improved Edit Distance Method for System Call Anomaly Detection , 2012, 2012 IEEE 12th International Conference on Computer and Information Technology.

[13]  Xiaoling Li,et al.  A survey of queries over uncertain data , 2013, Knowledge and Information Systems.

[14]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[15]  Jiong Yang,et al.  CLUSEQ: efficient and effective sequence clustering , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).