An efficient closed frequent itemset miner for the MOA stream mining system

Mining itemsets is a central task in data mining, both in the batch and the streaming paradigms. While robust, efficient, and well-tested implementations exist for batch mining, hardly any publicly available equivalent exists for the streaming scenario. The lack of an efficient, usable tool for the task hinders its use by practitioners and makes it difficult to assess new research in the area. To alleviate this situation, we review the algorithms described in the literature, and implement and evaluate the IncMine algorithm by Cheng, Ke and Ng [J. Intell. Inf. Syst. 31(3) (2008), 191–215] for mining frequent closed itemsets from data streams. Our implementation works on top of the MOA (Massive Online Analysis) stream mining framework to ease its use and integration with other stream mining tasks. We provide a PAC-style rigorous analysis of the quality of the output of IncMine as a function of its parameters; this type of analysis is rare in pattern mining algorithms. As a by-product, the analysis shows how one of the user-provided parameters in the original description can be removed entirely while retaining the performance guarantees. Finally, we experimentally confirm both on synthetic and real data the excellent performance of the algorithm, as reported in the original paper, and its ability to handle concept drift.

[1]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[5]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[6]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[7]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..

[8]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[9]  Peter Jackson,et al.  Lord of the rings: the fellowship of the ring , 2002, SIGGRAPH '02.

[10]  Suh-Yin Lee,et al.  Incremental updates of closed frequent itemsets over continuous data streams , 2009, Expert Syst. Appl..

[11]  Seyed Mostafa Fakhrahmad,et al.  An Efficient Frequent Itemset Mining Method over High-speed Data Streams , 2012, Comput. J..

[12]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[13]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[14]  O. P. Vyas,et al.  Data Stream Mining: A Review , 2013 .

[15]  Wilfred Ng,et al.  Maintaining frequent closed itemsets over a sliding window , 2008, Journal of Intelligent Information Systems.

[16]  Yue-Shi Lee,et al.  A fast algorithm for mining frequent closed itemsets over stream sliding window , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[17]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[18]  Baihua Zheng,et al.  CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data , 2007, DASFAA.

[19]  Carson Kai-Sang Leung,et al.  Frequent itemset mining of uncertain data streams using the damped window model , 2011, SAC.

[20]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.

[21]  Sam Raimi,et al.  Spider-man , 2002, SIGGRAPH '02.

[22]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.