Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams: Mining the Frequent Patterns in an Arbitrary Sliding Window over Online Data Streams

Because of the fluidity and continuity of data stream, the knowledge embedded in stream data is most likely to be changed as time goes by. Thus, in most data stream applications, people are more interested in the information of the recent transactions than that of the old. This paper proposes a method for mining the frequent patterns in an arbitrary sliding window of data streams. As data stream flows, the contents of the data stream are captured with a compact prefix-tree by scanning the stream only once. And the obsolete and infrequent items are deleted by periodically pruning the tree. To differentiate the patterns of recently generated transactions from those of historic transactions, a time decaying model is also applied. Extensive simulations are conducted and the experimental results show that the proposed method is efficient and scalable, and also superior to other analogous algorithms.

[1]  Suh-Yin Lee,et al.  Incremental Mining of Sequential Patterns over a Stream Sliding Window , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[3]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[4]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[5]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[6]  Lap-Kei Lee,et al.  A simpler and more efficient deterministic scheme for finding frequent items over sliding windows , 2006, PODS '06.

[7]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[8]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[9]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[10]  Philip S. Yu,et al.  Mining Data Streams , 2005, The Data Mining and Knowledge Discovery Handbook.

[11]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[12]  Raymond Chi-Wing Wong,et al.  Mining top-K frequent itemsets from data streams , 2006, Data Mining and Knowledge Discovery.

[13]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[14]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[15]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).