Stream mining of frequent sets with limited memory

With advances in technology, streams of data are produced in many applications. Efficient techniques for extracting implicit, previously unknown, and potentially useful information (e.g., in the form frequent sets) from data streams are in demand. Many existing stream mining algorithms capture important streaming data and assume that the captured data can fit into main memory. However, problem arose when the available memory is so limited that such an assumption does not hold. In this paper, we propose a novel data structure called DSTable to capture important data from the streams onto the disk. The DSTable can be easily maintained; it can be applicable for mining frequent sets from datasets, especially in limited memory environments.

[1]  Carson Kai-Sang Leung,et al.  Mining uncertain data for frequent itemsets that satisfy aggregate constraints , 2010, SAC '10.

[2]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Carson Kai-Sang Leung,et al.  Frequent itemset mining of uncertain data streams using the damped window model , 2011, SAC.

[4]  David P. Woodruff,et al.  A General Method for Estimating Correlated Aggregates Over a Data Stream , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[5]  Alfredo Cuzzocrea,et al.  Vertical Frequent Pattern Mining from Uncertain Data , 2012, KES.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[8]  Srinivasan Parthasarathy,et al.  Out-of-core frequent pattern mining on a commodity PC , 2006, KDD '06.

[9]  Carson Kai-Sang Leung,et al.  Mining of Frequent Itemsets from Streams of Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Osmar R. Zaïane,et al.  Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[12]  Carson Kai-Sang Leung,et al.  Efficient Mining of Frequent Itemsets from Data Streams , 2008, BNCOD.

[13]  Carson Kai-Sang Leung,et al.  A new class of constraints for constrained frequent pattern mining , 2012, SAC '12.

[14]  Mohamed Medhat Gaber,et al.  Data Stream Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[15]  Xindong Wu,et al.  Mining emerging patterns by streaming feature selection , 2012, KDD.

[16]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[17]  Carson Kai-Sang Leung,et al.  Mining probabilistic datasets vertically , 2012, IDEAS '12.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[20]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).