An Efficient Frequent Pattern Mining Algorithm for Data Stream

Mining frequent patterns from transaction database, time series and data stream is an important task now. Last decade, there are mainly two kinds of algorithms on frequent pattern mining. One is Apriori based on generating and testing, the other is FP-growth based on dividing and conquering, which has been widely used in static data mining. But with the new requirements of data mining, mining frequent pattern is not restricted in the static datasets any more. For data stream, the frequent pattern mining algorithms must have strong ability of updating and adjusting to further improve its efficiency. This paper proposes a novel structure NC-Tree (New Compact Tree), which can recode and filter original data to compress dataset. At the same time, a new frequent pattern mining algorithm is introduced base on it, which can update and adjust the tree more efficiently. The experiments show the structure and algorithm obviously improves mining efficiency and ensures high accuracy.

[1]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Wonsuk Lee,et al.  Finding maximal frequent itemsets over online data streams adaptively , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Jia-Ling Koh,et al.  An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures1 , 2004, DASFAA.

[4]  Peiyi Tang,et al.  Mining frequent itemsets with partial enumeration , 2006, ACM-SE 44.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  Peiyi Tang,et al.  Mining frequent web access patterns with partial enumeration , 2007, ACM-SE 45.

[7]  Jia-Ling Koh,et al.  An Approximate Approach for Mining Recently Frequent Itemsets from Data Streams , 2006, DaWaK.

[8]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Dong Yi-sheng,et al.  Mining Frequent Closed Patterns from a Sliding Window over Data Streams , 2006 .

[12]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[13]  Carson Kai-Sang Leung,et al.  Efficient Mining of Constrained Frequent Patterns from Streams , 2006, 2006 10th International Database Engineering and Applications Symposium (IDEAS'06).