Mining Unordered Distance-Constrained Embedded Subtrees

Frequent subtree mining is an important problem in the area of association rule mining from semi-structured or tree structured documents, often found in many commercial, web and scientific domains. This paper presents the u3Razor algorithm, for mining unordered embedded subtrees where the distance of nodes relative to the root of the subtree needs to be considered. Mining distance-constrained unordered embedded subtrees will have important applications in web information systems, conceptual model analysis and more sophisticated knowledge matching. An encoding strategy is presented to efficiently enumerate candidate unordered embedded subtrees taking the distance of nodes relative to the root of the subtree into account. Both synthetic and real-world datasets were used for experimental evaluation and discussion.

[1]  Tharam S. Dillon,et al.  UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[2]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[3]  Yun Chi,et al.  Canonical forms for labelled trees and their applications in frequent subtree mining , 2005, Knowledge and Information Systems.

[4]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[5]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[6]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[7]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[8]  Tharam S. Dillon,et al.  State of the art of data mining of tree structured information , 2008, Comput. Syst. Sci. Eng..

[9]  Tharam S. Dillon,et al.  Mining Substructures in Protein Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Tharam S. Dillon,et al.  U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Tharam S. Dillon,et al.  Razor: mining distance-constrained embedded subtrees , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[13]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[14]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[15]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.