An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining

The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.

[1]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[2]  Gang Wu,et al.  Exception Detection of Data Stream Based on Improved Maximal Frequent Itemsets Mining , 2017 .

[3]  CoenenFrans,et al.  A new method for mining Frequent Weighted Itemsets based on WIT-trees , 2013 .

[4]  Yanqing Ji,et al.  A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Maybin K. Muyeba,et al.  HURI - A Novel Algorithm for Mining High Utility Rare Itemsets , 2012, ACITY.

[6]  Qingsheng Zhu,et al.  A novel outlier cluster detection algorithm without top-n parameter , 2017, Knowl. Based Syst..

[7]  Zengyou He,et al.  FP-outlier: Frequent pattern based outlier detection , 2005, Comput. Sci. Inf. Syst..

[8]  Jin Young Kim,et al.  Environmental sound event detection in wireless acoustic sensor networks for home telemonitoring , 2017, China Communications.

[9]  Namita Dilip Ganjewar Infrequent Weighted Itemset Mining Using Frequent Pattern Growth , 2015 .

[10]  R. Lakshmi,et al.  Minimal infrequent pattern based approach for mining outliers in data streams , 2015, Expert Syst. Appl..

[11]  A. Madansky Identification of Outliers , 1988 .

[12]  John J. Leggett,et al.  WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight , 2005, SDM.

[13]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[14]  Frans Coenen,et al.  A new method for mining Frequent Weighted Itemsets based on WIT-trees , 2013, Expert Syst. Appl..

[15]  Lin Feng,et al.  Research on Maximal Frequent Pattern Outlier Factor for Online High-Dimensional Time-Series Outlier Detection , 2010, J. Convergence Inf. Technol..

[16]  Mei Bai,et al.  An efficient algorithm for distributed density-based outlier detection on big data , 2016, Neurocomputing.

[17]  Gillian Dobbie,et al.  RP-Tree: Rare Pattern Tree Mining , 2011, DaWaK.

[18]  Ho-Jin Choi,et al.  Single-pass incremental and interactive mining for weighted frequent patterns , 2012, Expert Syst. Appl..

[19]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[20]  Yong Shi,et al.  COID: A cluster–outlier iterative detection approach to multi-dimensional data analysis , 2011, Knowledge and Information Systems.

[21]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.