Approximate mining of maximal frequent itemsets in data streams with different window models

A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams.

[1]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[2]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[3]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[4]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[5]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[6]  Won Suk Lee,et al.  Efficient mining method for retrieving sequential patterns over online data streams , 2005, J. Inf. Sci..

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[9]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[10]  Suh-Yin Lee,et al.  Online mining maximal frequent structures in continuous landmark melody streams , 2005, Pattern Recognit. Lett..

[11]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[12]  Wesley W. Chu,et al.  SmartMiner: a depth first algorithm guided by tail information for mining maximal frequent itemsets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[15]  Suh-Yin Lee,et al.  Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams , 2005, J. Univers. Comput. Sci..

[16]  Suh-Yin Lee,et al.  DSM-PLW: Single-pass mining of path traversal patterns over streaming Web click-sequences , 2006, Comput. Networks.

[17]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[18]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Won Suk Lee,et al.  estWin: Online data stream mining of recent frequent itemsets by sliding window method , 2005, J. Inf. Sci..

[20]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[21]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[22]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[23]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.