MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

Abstract Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFI-UDSM is proposed to mine the minimal infrequent itemsets (MiFIs) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of “item cap” and “support cap”. In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase.

[1]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[2]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[3]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[4]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[5]  Luca Cagliero,et al.  Infrequent Weighted Itemset Mining Using Frequent Pattern Growth , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Luigi Troiano,et al.  A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets , 2013, Data Mining and Knowledge Discovery.

[7]  Gang Yuan,et al.  An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining , 2019, China Communications.

[8]  Jie Lu,et al.  Accumulating regional density dissimilarity for concept drift detection in data streams , 2018, Pattern Recognit..

[9]  Hamido Fujita,et al.  Damped window based high average utility pattern mining over data streams , 2017, Knowl. Based Syst..

[10]  Fan Guidan,et al.  A Frequent Itemsets Mining Algorithm Based on Matrix in Sliding Window over Data Streams , 2013, 2013 Third International Conference on Intelligent System Design and Engineering Applications.

[11]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[12]  Lei Tian,et al.  Clustering in the wireless channel with a power weighted statistical mixture model in indoor scenario , 2019, China Communications.

[13]  Xiaoyang Yu,et al.  Mining community and inferring friendship in mobile social networks , 2016, Neurocomputing.

[14]  Amedeo Napoli,et al.  Towards Rare Itemset Mining , 2007 .

[15]  Gang Wu,et al.  Mining recent maximal frequent itemsets over data streams with sliding window , 2019, Int. Arab J. Inf. Technol..

[16]  Gang Yuan,et al.  Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream , 2018, Neural Computing and Applications.

[17]  Zengyou He,et al.  FP-outlier: Frequent pattern based outlier detection , 2005, Comput. Sci. Inf. Syst..

[18]  Ning Lu,et al.  Concept drift detection via competence models , 2014, Artif. Intell..

[19]  Wei Fang,et al.  Meteorological Data Analysis Using MapReduce , 2014, TheScientificWorldJournal.

[20]  R. Lakshmi,et al.  Minimal infrequent pattern based approach for mining outliers in data streams , 2015, Expert Syst. Appl..

[21]  Lin Feng,et al.  Research on Maximal Frequent Pattern Outlier Factor for Online High-Dimensional Time-Series Outlier Detection , 2010, J. Convergence Inf. Technol..

[22]  Keyan Cao,et al.  Continuous Outlier Monitoring on Uncertain Data Streams , 2014, Journal of Computer Science and Technology.

[23]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[24]  Ivan G. Guardiola,et al.  A Functional Data Analysis Approach to Traffic Volume Forecasting , 2018, IEEE Transactions on Intelligent Transportation Systems.

[25]  Yongsub Lim,et al.  Time-weighted counting for recently frequent pattern mining in data streams , 2017, Knowledge and Information Systems.

[26]  Unil Yun,et al.  A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives , 2017, Future Gener. Comput. Syst..

[27]  Carson Kai-Sang Leung,et al.  Finding efficiencies in frequent pattern mining from big uncertain data , 2017, World Wide Web.

[28]  Anna M. Manning,et al.  On Minimal Infrequent Itemset Mining , 2007, DMIN.

[29]  Chao Deng,et al.  Abnormal Detecting over Data Stream Based on Maximal Pattern Mining Technology , 2018 .

[30]  Juan Li,et al.  TDMCS: an efficient method for mining closed frequent patterns over data streams based on time decay model , 2017, Int. Arab J. Inf. Technol..

[31]  Qingsheng Zhu,et al.  A novel outlier cluster detection algorithm without top-n parameter , 2017, Knowl. Based Syst..

[32]  Yannis Manolopoulos,et al.  Efficient and flexible algorithms for monitoring distance-based outliers over data streams , 2016, Inf. Syst..

[33]  Yong Shi,et al.  COID: A cluster–outlier iterative detection approach to multi-dimensional data analysis , 2011, Knowledge and Information Systems.

[34]  Ruizhi Sun,et al.  An Efficient Outlier Detection Approach Over Uncertain Data Stream Based on Frequent Itemset Mining , 2019, Inf. Technol. Control..

[35]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.

[36]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[37]  Mei Bai,et al.  An efficient algorithm for distributed density-based outlier detection on big data , 2016, Neurocomputing.

[38]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[39]  Ge Yu,et al.  Outlier Detection over Sliding Windows for Probabilistic Data Streams , 2010, Journal of Computer Science and Technology.