论文信息 - An Efficient Algorithm for Mining Frequent Patterns over High Speed Data Streams

An Efficient Algorithm for Mining Frequent Patterns over High Speed Data Streams

The existing algorithms for mining frequent patterns usually divide into two steps. One is calculating the frequency of itemsets while monitoring each arrival of data stream. The other is to output the frequent itemsets. Due to the large number of item combinations, calculating frequency spends a lot of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transaction arriving. Proposed in this paper is an highly effective algorithm for mining frequent patterns over high speed data streams. The algorithm delays calculation of the frequency to the 2nd step. The 1st step only stores necessary information for each transaction, which can avoid missing any transaction arriving. Because the 1st step and the 2nd step are relatively independent, therefore the two steps may process synchronization. Experiments show that the algorithm exceed the existing algorithms, LossyCounting and FDPM, especially for long transaction data streams.

Cai-xia Meng | C. Meng

[1] Suh-Yin Lee,et al. An Efficient Algorithm for Mining Frequent Itemests over the Entire History of Data Streams , 2004 .

[2] Hongjun Lu,et al. False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[3] Divyakant Agrawal,et al. Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[4] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5] Philip S. Yu,et al. Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[6] 沈錳坤. An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams , 2004 .

[7] Moses Charikar,et al. Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[8] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9] Yossi Matias,et al. New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[10] Zhigang Chen,et al. A Mining Maximal Frequent Itemsets over the Entire History of Data Streams , 2009, 2009 First International Workshop on Database Technology and Applications.

[11] Rajeev Motwani,et al. Approximate Frequency Counts over Data Streams , 2012, VLDB.