A Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees

Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees. This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.

[1]  Atsuhiro Takasu,et al.  An Improved Clique-Based Method for Computing Edit Distance between Unordered Trees and Its Application to Comparison of Glycan Structures , 2011, 2011 International Conference on Complex, Intelligent, and Software Intensive Systems.

[2]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[3]  Tatsuya Akutsu,et al.  Efficient Algorithms for Finding Maximum and Maximal Cliques: Effective Tools for Bioinformatics , 2011 .

[4]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Edwin R. Hancock,et al.  Computing approximate tree edit distance using relaxation labeling , 2003, Pattern Recognit. Lett..

[6]  Kaleem Siddiqi,et al.  Matching Hierarchical Structures Using Association Graphs , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[8]  Anthony N. Laskovski,et al.  Biomedical Engineering, Trends in Electronics, Communications and Software , 2011 .

[9]  Etsuji Tomita,et al.  An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique , 2003, DMTCS.

[10]  Shinya Takahashi,et al.  A Simple and Faster Branch-and-Bound Algorithm for Finding a Maximum Clique , 2010, WALCOM.

[11]  Yair Horesh,et al.  Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees , 2006, J. Comput. Biol..

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[14]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[15]  Atsuhiro Takasu,et al.  Exact algorithms for computing the tree edit distance between unordered trees , 2010, Theor. Comput. Sci..

[16]  Hideo Ogawa Labeled point pattern matching by Delaunay triangulation and maximal cliques , 1986, Pattern Recognit..

[17]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[18]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[19]  William E. Higgins,et al.  System for the analysis and visualization of large 3D anatomical trees , 2007, Comput. Biol. Medicine.

[20]  Atsuhiro Takasu,et al.  A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures , 2011, BMC Bioinformatics.

[21]  Jirí Matousek,et al.  Invitation to discrete mathematics , 1998 .

[22]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.