论文信息 - Efficient data mining for maximal frequent subtrees

Efficient data mining for maximal frequent subtrees

A new type of tree mining is defined, which uncovers maximal frequent induced subtrees from a database of unordered labeled trees. A novel algorithm, PathJoin, is proposed. The algorithm uses a compact data structure, FST-Forest, which compresses the trees and still keeps the original tree structure. PathJoin generates candidate subtrees by joining the frequent paths in FST-Forest. Such candidate subtree generation is localized and thus substantially reduces the number of candidate subtrees. Experiments with synthetic data sets show that the algorithm is effective and efficient.

[1] Margaret H. Dunham,et al. Efficient mining of traversal patterns , 2001, Data Knowl. Eng..

[2] Dennis Shasha,et al. Algorithmics and applications of tree and graph searching , 2002, PODS.

[3] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4] Hiroki Arimura,et al. Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[5] Takashi Washio,et al. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[6] Philip S. Yu,et al. Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[7] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8] Ke Wang,et al. Discovering Frequent Substructures from Hierarchical Semi-structured Data , 2002, SDM.

[9] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10] Mohammed J. Zaki. Efficiently mining frequent trees in a forest , 2002, KDD.

[11] George Karypis,et al. Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[12] Hiroki Arimura,et al. Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[13] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.