Efficient single-pass frequent pattern mining using a prefix-tree

The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan.

[1]  Jia-Ling Koh,et al.  An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures1 , 2004, DASFAA.

[2]  Guoqing Chen,et al.  Fuzzy association rules and the extended mining algorithms , 2002, Inf. Sci..

[3]  Anthony J. T. Lee,et al.  Mining spatial association rules in image databases , 2007, Inf. Sci..

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5]  Yue-Shi Lee,et al.  Incremental and interactive mining of web traversal patterns , 2008, Inf. Sci..

[6]  Xindong Wu,et al.  Association analysis with one scan of databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[8]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[9]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[10]  Yuh-Jiuan Tsay,et al.  An efficient cluster and decomposition algorithm for mining association rules , 2004, Inf. Sci..

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Hongjun Lu,et al.  CFP-tree: A compact disk-based structure for storing and querying frequent itemsets , 2007, Inf. Syst..

[13]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[15]  Laks V. S. Lakshmanan,et al.  Exploiting succinct constraints using FP-trees , 2002, SKDD.

[16]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[18]  Suh-Yin Lee,et al.  Interactive sequence discovery by incremental mining , 2004, Inf. Sci..

[19]  Anthony J. T. Lee,et al.  An efficient algorithm for mining frequent inter-transaction patterns , 2007, Inf. Sci..

[20]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[21]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[22]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Necip Fazil Ayan,et al.  An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[25]  Xin Li,et al.  A Fast Algorithm for Maintenance of Association Rules in Incremental Databases , 2006, ADMA.

[26]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[27]  Hui Xiong,et al.  Discovery of maximum length frequent itemsets , 2008, Inf. Sci..

[28]  Qing Li,et al.  From intra-transaction to generalized inter-transaction: Landscaping multidimensional contexts in association rule mining , 2005, Inf. Sci..

[29]  Fan Wu,et al.  A new approach to mine frequent patterns using item-transformation methods , 2007, Inf. Syst..

[30]  Chia-Hui Chang,et al.  Enhancing SWF for Incremental Association Mining by Itemset Maintenance , 2003, PAKDD.

[31]  Chengqi Zhang,et al.  EDUA: An efficient algorithm for dynamic database mining , 2007, Inf. Sci..

[32]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[33]  Enhong Chen,et al.  Efficient strategies for tough aggregate constraint-based sequential pattern mining , 2008, Inf. Sci..

[34]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[35]  Tzung-Pei Hong,et al.  Incrementally fast updated frequent pattern trees , 2008, Expert Syst. Appl..

[36]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .