A New Algorithm for Maintaining Closed Frequent Itemsets in Data Streams by Incremental Updates

Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithm Moment for mining closed frequent itemsets over recent data streams

[1]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[2]  Won Suk Lee,et al.  Decaying Obsolete Information in Finding Recent Frequent Itemsets over Data Streams , 2004, IEICE Trans. Inf. Syst..

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Suh-Yin Lee,et al.  Efficient Maintenance and Mining of Frequent Itemsets over Online Data Streams with a Sliding Window , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Won Suk Lee,et al.  A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams , 2004, J. Inf. Sci. Eng..

[9]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[10]  Suh-Yin Lee,et al.  Online mining (recently) maximal frequent itemsets over data streams , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[11]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[12]  沈錳坤 An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .

[13]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Hongjun Lu,et al.  False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[15]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[16]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[17]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[18]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[19]  Suh-Yin Lee,et al.  An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[20]  Raymond Chi-Wing Wong,et al.  Mining Top-K Itemsets over a Sliding Window Based on Zipfian Distribution , 2005, SDM.