Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing

Mining frequent patterns in database has emerged as an important task in knowledge discovery and data mining. In this paper, we present an efficient algorithm called Mop for fast frequent pattern discovery. Mop utilizes a new kind of data structure called OP_tree (ordered pattern tree) and some particular properties of frequent patterns to facilitate the process of mining frequent patterns. An OP_tree is a special frequent pattern tree, where the children of any node are sorted according to the supports of corresponding items. Efficiency of Mop is achieved with three techniques: (1) it adopts OP_tree to store a large database to avoid repetitive database scans, (2) it finds all frequent 2-patterns in the construction of OP_tree to avoid the costly generation of a large number of candidate 2-patterns, (3) the supports of candidate k-patterns (k>2) can be obtained by traversing a few of specific subtrees of the OP_tree, which greatly reduces the search space and avoid multi-scans of a database. We experimentally compare our algorithm with the Apriori algorithm and the FP-growth algorithm on one real database and one synthetical database. The experimental results show that Mop is about an order of magnitude faster than the Apriori algorithm. Mop also outperforms the FP-growth algorithm, especially when support threshold is very low and databases are quite large.

[1]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[2]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[7]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[8]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[9]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[11]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[12]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[13]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD 2000.

[14]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[15]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[16]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[17]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[18]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[19]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[20]  Hongjun Lu,et al.  Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[21]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[22]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[24]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[25]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[26]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.