New Techniques for Mining Frequent Patterns in Unordered Trees

We consider a new tree mining problem that aims to discover restrictedly embedded subtree patterns from a set of rooted labeled unordered trees. We study the properties of a canonical form of unordered trees, and develop new Apriori-based techniques to generate all candidate subtrees level by level through two efficient rightmost expansion operations: 1) pairwise joining and 2) leg attachment. Next, we show that restrictedly embedded subtree detection can be achieved by calculating the restricted edit distance between a candidate subtree and a data tree. These techniques are then integrated into an efficient algorithm, named frequent restrictedly embedded subtree miner (FRESTM), to solve the tree mining problem at hand. The correctness of the FRESTM algorithm is proved and the time and space complexities of the algorithm are discussed. Experimental results on synthetic and real-world data demonstrate the effectiveness of the proposed approach.

[1]  Mohammed J. Zaki Efficiently Mining Frequent Embedded Unordered Trees , 2004, Fundam. Informaticae.

[2]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[3]  Joost N. Kok,et al.  Efficient discovery of frequent unordered trees , 2003 .

[4]  Ke Wang,et al.  Discovering typical structures of documents: a road map approach , 1998, SIGIR '98.

[5]  Ching Y. Suen,et al.  Matching of complex patterns by energy minimization , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[6]  Yun Chi,et al.  HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[7]  Kaizhong Zhang,et al.  On the Editing Distance Between Undirected Acyclic Graphs , 1996, Int. J. Found. Comput. Sci..

[8]  King-Sun Fu,et al.  A Tree System Approach for Fingerprint Pattern Recognition , 1976, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  J. Cubero,et al.  POTMiner: mining ordered, unordered, and partially-ordered trees , 2010 .

[10]  Kaizhong Zhang,et al.  Finding approximate patterns in undirected acyclic graphs , 2002, Pattern Recognit..

[11]  Sen Zhang,et al.  Discovering Frequent Agreement Subtrees from Phylogenetic Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Caro Lucas,et al.  OInduced: An Efficient Algorithm for Mining Induced Patterns From Rooted Ordered Trees , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[13]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  W. Marsden I and J , 2012 .

[15]  Lei Zou,et al.  Mining Frequent Induced Subtree Patterns with Subtree-Constraint , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Ponnuthurai N. Suganthan,et al.  Structural pattern recognition using genetic algorithms with specialized operators , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[17]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[18]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[19]  Kaizhong Zhang,et al.  A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[20]  Kaizhong Zhang A New Editing based Distance between Unordered Labeled Trees , 1993, CPM.

[21]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[22]  Kaizhong Zhang,et al.  A Method for Discovering Common Patterns from Two RNA Secondary Structures and its Application to Structural Repeat Detection , 2012, J. Bioinform. Comput. Biol..

[23]  Lizhi Liu,et al.  Mining Frequent Embedded Subtree from Tree-Like Databases , 2011, 2011 International Conference on Internet Computing and Information Services.

[24]  Kaizhong Zhang,et al.  Exact and approximate algorithms for unordered tree matching , 1994, IEEE Trans. Syst. Man Cybern..

[25]  Sen Zhang,et al.  Unordered tree mining with applications to phylogeny , 2004, Proceedings. 20th International Conference on Data Engineering.

[26]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.