A novel technique for mining closed frequent itemsets using variable sliding window

Frequent itemset mining over dynamic data is an important problem in the context of knowledge discovery and data mining. Various data stream models are being used for mining frequent itemsets. In a data stream model the data arrive at high speed such that the algorithms used for mining data streams must process them in strict constraint of time and space. Due to emphasis over recent data and its bounded memory requirement, sliding window model is a widely used model for mining frequent itemset over data stream. In this paper we proposed an algorithm named Variable-Moment for mining both frequent and closed frequent itemset over data stream. The algorithm is appropriate for noticing latest or new changes in the set of frequent itemset by making its window size variable, which is determined dynamically based on the extent of concept drift occurring within the arriving data stream. The size of window expands when there is no concept drift in the arriving data stream and size shrinks when there is a concept change. The relative support instead of absolute support is being used for making the concept of variable window effective. The algorithm uses an in-memory data structure to store frequent itemsets. Data structure gets updated whenever a batch of transaction is added or deleted from the sliding window to output exact frequent itemsets. Extensive experiments on both real and synthetic data show that our algorithm excellently spots the concept changes and adapts itself to the new concept along data stream by adjusting window size.

[1]  Won Suk Lee,et al.  estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[4]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[5]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[6]  Vikas Kumar,et al.  A Review on Algorithms for Mining Frequent Itemset Over Data Stream , 2013 .

[7]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[8]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[9]  Pauray S. M. Tsai,et al.  Mining frequent itemsets in data streams using the weighted sliding window model , 2009, Expert Syst. Appl..

[10]  Won Suk Lee,et al.  estWin: adaptively monitoring the recent change of frequent itemsets over online data streams , 2003, CIKM '03.

[11]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[12]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[13]  Jia-Ling Koh,et al.  Concept Shift Detection for Frequent Itemsets from Sliding Windows over Data Streams , 2009, DASFAA Workshops.

[14]  Mohammad Hadi Sadreddini,et al.  A sliding window based algorithm for frequent closed itemset mining over data streams , 2013, J. Syst. Softw..

[15]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[16]  Zhigang Chen,et al.  A Mining Maximal Frequent Itemsets over the Entire History of Data Streams , 2009, 2009 First International Workshop on Database Technology and Applications.

[17]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[18]  Hongjun Lu,et al.  False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[19]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[20]  沈錳坤 An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .