Top-down mining of frequent closed patterns from very high dimensional data

Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.

[1]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[2]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[3]  Anthony K. H. Tung,et al.  COBBLER: combining column and row enumeration for closed pattern discovery , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4]  Yue-Shi Lee,et al.  Incremental and interactive mining of web traversal patterns , 2008, Inf. Sci..

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  J T LeeAnthony,et al.  Mining spatial association rules in image databases , 2007 .

[7]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8]  C. Niehrs,et al.  Synexpression groups in eukaryotes , 1999, Nature.

[9]  Unil Yun,et al.  Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..

[10]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[11]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[12]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[13]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[14]  Hui Xiong,et al.  On the strength of hyperclique patterns for text categorization , 2007, Inf. Sci..

[15]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .

[18]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[19]  Anthony J. T. Lee,et al.  Mining spatial association rules in image databases , 2007, Inf. Sci..

[20]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[21]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[22]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[23]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[24]  Anthony J. T. Lee,et al.  An efficient algorithm for mining frequent inter-transaction patterns , 2007, Inf. Sci..

[25]  Jean-François Boulicaut,et al.  Using transposition for pattern discovery from microarray data , 2003, DMKD '03.

[26]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[27]  Hui Xiong,et al.  Discovery of maximum length frequent itemsets , 2008, Inf. Sci..

[28]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.