Mining Frequent Patterns from Very High Dimensional Data : A Top-Down Row Enumeration Approach

Data sets of very high dimensionality, such as microarray data, pose great challenges on efficient processing to most existing data mining algorithms. Recently, there comes a row-enumeration method that performs a bottom-up search of row combination space to find corresponding frequent patterns. Due to a limited number of rows in microarray data, this method is more efficient than column enumerationbased algorithms. However, the bottom-up search strategy cannot take an advantage of user-specified minimum support threshold to effectively prune search space, and therefore leads to long runtime and much memory overhead. In this paper we propose a new search strategy, top-down mining, integrated with a novel rowenumeration tree, which makes full use of the pruning power of the minimum support threshold to cut down search space dramatically. Using this kind of searching strategy, we design an algorithm, TD-Close, to find a complete set of frequent closed patterns from very high dimensional data. Furthermore, an effective closeness-checking method is also developed that avoids scanning the dataset multiple times. Our performance study shows that the TD-Close algorithm outperforms substantially both Carpenter, a bottom-up searching algorithm, and FPclose, a column enumeration-based frequent closed pattern mining algorithm.

[1]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[2]  C. Niehrs,et al.  Synexpression groups in eukaryotes , 1999, Nature.

[3]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[4]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[5]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[6]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[7]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[8]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[9]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[10]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[11]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[12]  Anthony K. H. Tung,et al.  COBBLER: combining column and row enumeration for closed pattern discovery , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[13]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[14]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[15]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Association Rule Mining , 2007 .