论文信息 - Top-down mining of frequent closed patterns from very high dimensional data

Top-down mining of frequent closed patterns from very high dimensional data

Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close.

[1] Philip S. Yu,et al. Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[2] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[3] Anthony K. H. Tung,et al. COBBLER: combining column and row enumeration for closed pattern discovery , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4] Yue-Shi Lee,et al. Incremental and interactive mining of web traversal patterns , 2008, Inf. Sci..

[5] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[6] J T LeeAnthony,et al. Mining spatial association rules in image databases , 2007 .

[7] Nicolas Pasquier,et al. Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[8] C. Niehrs,et al. Synexpression groups in eukaryotes , 1999, Nature.

[9] Unil Yun,et al. Efficient mining of weighted interesting patterns with a strong weight and/or support affinity , 2007, Inf. Sci..

[10] Jinyan Li,et al. CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[11] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[12] Jinyan Li,et al. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. , 2002 .

[13] Gösta Grahne,et al. Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.