TFP: an efficient algorithm for mining top-k frequent closed itemsets

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min/spl I.bar/support threshold and aim at mining a complete set of frequent itemsets satisfying min/spl I.bar/support. However, in practice, it is difficult for users to provide an appropriate min/spl I.bar/support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min/spl I.bar/l, where k is the desired number of frequent closed itemsets to be mined, and min/spl I.bar/l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins/spl I.bar/support. Starting at min/spl I.bar/support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min/spl I.bar/support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant/spl I.bar/sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.

[1]  Jean-François Boulicaut,et al.  Using transposition for pattern discovery from microarray data , 2003, DMKD '03.

[2]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[3]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[4]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[6]  Jian Tang,et al.  Mining N-most Interesting Itemsets , 2000, ISMIS.

[7]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[8]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Dino Pedreschi,et al.  ExAnte: Anticipated Data Reduction in Constrained Pattern Mining , 2003, PKDD.

[11]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[12]  Shinichi Morishita,et al.  Transversing itemset lattices with statistical metric pruning , 2000, PODS '00.

[13]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[14]  Jean-François Boulicaut,et al.  Frequent Closures as a Concise Representation for Binary Data Mining , 2000, PAKDD.

[15]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[16]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[17]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[18]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[19]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[21]  森下 真一,et al.  Parallel Branch-and-Bound Graph Search for Correlated Association Rules , 1999 .

[22]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  瀬々 潤,et al.  Traversing Itemset Lattices with Statistical Metric Pruning (小特集 「発見科学」及び一般演題) , 2000 .

[24]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[25]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[26]  Christian Hidber,et al.  Association Rule Mining , 2017 .

[27]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[28]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[29]  Hongjun Lu,et al.  On computing, storing and querying frequent patterns , 2003, KDD '03.

[30]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.