Mining top-k frequent patterns over data streams sliding window

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.

[1]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Toon Calders,et al.  Mining top-k frequent items in a data stream with flexible sliding windows , 2010, KDD.

[3]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[4]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Carlo Zaniolo,et al.  Relational languages and data models for continuous queries on sequences and data streams , 2011, TODS.

[6]  Hui Chen,et al.  Mining frequent patterns in a varying-size sliding window of online transactional data streams , 2012, Inf. Sci..

[7]  Raymond Chi-Wing Wong,et al.  Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution , 2005, SDM.

[8]  Ada Wai-Chee Fu,et al.  Mining frequent itemsets without support threshold: with and without item constraints , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[10]  João Paulo Carvalho,et al.  Finding top-k elements in data streams , 2010, Inf. Sci..

[11]  Ling Chen,et al.  A clustering algorithm for multiple data streams based on spectral component similarity , 2012, Inf. Sci..

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Hua-Fu Li,et al.  A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences , 2009, Expert Syst. Appl..

[14]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[15]  Sattar Hashemi,et al.  Adapted One-versus-All Decision Trees for Data Stream Classification , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[18]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[19]  Pauray S. M. Tsai,et al.  Mining top-k frequent closed itemsets over data streams using the sliding window model , 2010, Expert Syst. Appl..

[20]  Suh-Yin Lee,et al.  DSM-FI: an efficient algorithm for mining frequent itemsets in data streams , 2008, Knowledge and Information Systems.

[21]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[22]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[23]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[24]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[25]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[26]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Won Suk Lee,et al.  Anomaly intrusion detection by clustering transactional audit streams in a host computer , 2010, Inf. Sci..

[28]  Kuen-Fang Jea,et al.  A Sliding-window Based Adaptive Approximating Method to Discover Recent Frequent Itemsets from Data Streams , 2010 .

[29]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.