Mining Frequent Embedded Subtree from Tree-Like Databases

Mining frequent sub tree from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. In this paper an efficient algorithm is introduced for mining frequent, ordered, embedded sub tree in tree-like databases. Using a new data structure called scope-list, which is a canonical representation of tree node, the algorithm first generates all candidate trees, then enumerates embedded, ordered trees, finally joins scope-list to compute frequency of embedded ordered trees. Experiments show the performance of the algorithm is about 15% better than other similar mining methods and has good scale-up properties.

[1]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[2]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[3]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[5]  J. Skilling,et al.  Algorithms and Applications , 1985 .

[6]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[7]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.