An Efficient Algorithm for Mining Frequent Patterns over High Speed Data Streams

The existing algorithms for mining frequent patterns usually divide into two steps. One is calculating the frequency of itemsets while monitoring each arrival of data stream. The other is to output the frequent itemsets. Due to the large number of item combinations, calculating frequency spends a lot of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transaction arriving. Proposed in this paper is an highly effective algorithm for mining frequent patterns over high speed data streams. The algorithm delays calculation of the frequency to the 2nd step. The 1st step only stores necessary information for each transaction, which can avoid missing any transaction arriving. Because the 1st step and the 2nd step are relatively independent, therefore the two steps may process synchronization. Experiments show that the algorithm exceed the existing algorithms, LossyCounting and FDPM, especially for long transaction data streams.