Mining Frequent Induced Subtree Patterns with Subtree-Constraint

Mining frequent induced subtree patterns is very useful in domains such as XML databases, Web log analyzing. However, because of the combinatorial explosion, mining all frequent subtree patterns becomes infeasible for a large and dense tree database. And too many frequent subtree patterns also confuse users. Usually only a small set of the mining results can arouse users' interests. In this paper, we propose a problem to discover frequent induced subtree patterns that are super trees of a given pattern tree specified by users, i.e. frequent induced subtree patterns with subtree-constraint. Most existing frequent subtree mining algorithms are based on right-most extension, which does not work well in the new problem. So free extension is presented to replace right-most extension in this paper. To avoid the duplicate pattern problem caused by free extension, we develop an efficient method that ensures no duplicate patterns in mining process or results. Then subtree-constraint frequent subtree patterns mining algorithm, i.e. SCFS algorithm, is given. The experiment results also show that our algorithm achieves good performance

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[3]  Yun Chi,et al.  Mining closed and maximal frequent subtrees from databases of labeled rooted trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hiroki Arimura,et al.  Efficient Substructure Discovery from Large Semi-Structured Data , 2001, IEICE Trans. Inf. Syst..

[5]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[6]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.