A sliding window based algorithm for frequent closed itemset mining over data streams

Frequent pattern mining over data streams is an important problem in the context of data mining and knowledge discovery. Mining frequent closed itemsets within sliding window instead of complete set of frequent itemset is very interesting since it needs a limited amount of memory and processing power. Moreover, handling concept change within a compact set of closed patterns is faster. However, it requires flexible and efficient data structures as well as intuitive algorithms. In this paper, we have introduced an effective and efficient algorithm for closed frequent itemset mining over data streams operating in the sliding window model. This algorithm uses a novel data structure for storing transactions of the window and corresponding frequent closed itemsets. Moreover, the support of a new frequent closed itemset is efficiently computed and an old pattern is removed from the monitoring set when it is no longer frequent closed itemset. Extensive experiments on both real and synthetic data streams show that the proposed algorithm is superior to previously devised algorithms in terms of runtime and memory usage.

[1]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[2]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[3]  Mohammad Hadi Sadreddini,et al.  EclatDS: An efficient sliding window based frequent pattern mining method for data streams , 2011, Intell. Data Anal..

[4]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[5]  Raymond Chi-Wing Wong,et al.  Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution , 2005, SDM.

[6]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[7]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[8]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[9]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[10]  Suh-Yin Lee,et al.  Incremental updates of closed frequent itemsets over continuous data streams , 2009, Expert Syst. Appl..

[11]  沈錳坤 An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .

[12]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[13]  Fei-yue Ye,et al.  New algorithm for mining frequent itemsets in sparse database , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[14]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[15]  Arbee L. P. Chen,et al.  Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window , 2005, SDM.

[16]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[17]  Jia-Ling Koh,et al.  Concept Shift Detection for Frequent Itemsets from Sliding Windows over Data Streams , 2009, DASFAA Workshops.

[18]  Won Suk Lee,et al.  estWin: Online data stream mining of recent frequent itemsets by sliding window method , 2005, J. Inf. Sci..

[19]  Won Suk Lee,et al.  estMax: Tracing Maximal Frequent Item Sets Instantly over Online Transactional Data Streams , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Hong Chen,et al.  Mining non-derivable frequent itemsets over data stream , 2009, Data Knowl. Eng..

[21]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[22]  Hong Chen,et al.  An Efficient Algorithm for Frequent Itemset Mining on Data Streams , 2006, Industrial Conference on Data Mining.

[23]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[25]  Xuejun Liu,et al.  Mining frequent closed itemsets from a landmark window over online data streams , 2009, Comput. Math. Appl..