Mining frequent patterns in a varying-size sliding window of online transactional data streams

In some data stream applications, the information embedded in the data arriving in the most recent time period is of particular interest. This paper proposes a method for efficiently mining the frequent patterns in a varying-size sliding window of online data streams. To highlight recent frequent patterns in the data stream, a time decay model is used to differentiate the patterns of recently generated transactions from historical transactions. The derived concrete bounds of the decay factor can achieve either 100% recall or 100% precision. A summary data structure, named SWP-tree, is proposed for capturing the content of the transactions in the sliding window by scanning the stream only once. In order to speed up online processing of new transactions, the information of frequent patterns recorded in the SWP-tree is updated in an incrementally way. To make the mining operation efficient, the SWP-tree is periodically pruned by identifying insignificant patterns, which include two kinds of obsolete pattern and two kinds of infrequent pattern. Since the sliding window can change its size, the effect of window size is examined. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

[1]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[3]  D. J. H. Garling,et al.  The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities by J. Michael Steele , 2005, Am. Math. Mon..

[4]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[5]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[6]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[7]  Won Suk Lee,et al.  Anomaly intrusion detection by clustering transactional audit streams in a host computer , 2010, Inf. Sci..

[8]  Kuen-Fang Jea,et al.  A Sliding-window Based Adaptive Approximating Method to Discover Recent Frequent Itemsets from Data Streams , 2010 .

[9]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[11]  Toon Calders,et al.  Mining top-k frequent items in a data stream with flexible sliding windows , 2010, KDD.

[12]  Arbee L. P. Chen,et al.  Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window , 2005, SDM.

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Ling Chen,et al.  A clustering algorithm for multiple data streams based on spectral component similarity , 2012, Inf. Sci..

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  João Paulo Carvalho,et al.  Finding top-k elements in data streams , 2010, Inf. Sci..

[17]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[18]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[19]  Arbee L. P. Chen,et al.  A novel hash-based approach for mining frequent itemsets over data streams requiring less memory space , 2009, Data Mining and Knowledge Discovery.

[20]  M. Tamer Özsu,et al.  Mining frequent itemsets in time-varying data streams , 2009, CIKM.

[21]  Suh-Yin Lee,et al.  Approximate mining of maximal frequent itemsets in data streams with different window models , 2008, Expert Syst. Appl..

[22]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Harold S. Javitz,et al.  The NIDES Statistical Component Description and Justification , 1994 .

[24]  Suh-Yin Lee,et al.  DSM-FI: an efficient algorithm for mining frequent itemsets in data streams , 2008, Knowledge and Information Systems.

[25]  Ajith Abraham,et al.  An efficient algorithm for incremental mining of temporal association rules , 2010, Data Knowl. Eng..

[26]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[27]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[28]  Hua-Fu Li,et al.  Incremental mining of closed inter-transaction itemsets over data stream sliding windows , 2011, J. Inf. Sci..

[29]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[30]  Carlo Zaniolo,et al.  Relational languages and data models for continuous queries on sequences and data streams , 2011, TODS.

[31]  Lei Chen,et al.  Continuous monitoring of skylines over uncertain data streams , 2012, Inf. Sci..