An Improved Clique-Based Method for Computing Edit Distance between Unordered Trees and Its Application to Comparison of Glycan Structures

The tree edit distance is one of the most widely used measures for comparison of tree structured data and has been used for analysis of RNA secondary structures, glycan structures, and vascular trees. However, it is known that the tree edit distance problem is NP-hard for unordered trees while it is polynomial time solvable for ordered trees. We have recently proposed a clique-based method for computing the tree edit distance between unordered trees in which each instance of the tree edit distance problem is transformed into an instance of the maximum vertex weighted clique problem and then an existing clique algorithm is applied. In this paper, we propose an improved clique-based method. Different from our previous method, the improved method is basically a dynamic programming algorithm that repeatedly solves instances of the maximum vertex weighted clique problem as sub-problems. Other heuristic techniques, which do not violate the optimality of the solution, are also introduced. When applied to comparison of large glycan structures, our improved method showed significant speed-up in most cases.

[1]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[2]  Tatsuya Akutsu,et al.  Efficient Algorithms for Finding Maximum and Maximal Cliques: Effective Tools for Bioinformatics , 2011 .

[3]  Etsuji Tomita,et al.  An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique , 2003, DMTCS.

[4]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[5]  Yair Horesh,et al.  Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees , 2006, J. Comput. Biol..

[6]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[7]  Atsuhiro Takasu,et al.  A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures , 2011, BMC Bioinformatics.

[8]  Edwin R. Hancock,et al.  Computing approximate tree edit distance using relaxation labeling , 2003, Pattern Recognit. Lett..

[9]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[10]  Shinya Takahashi,et al.  A Simple and Faster Branch-and-Bound Algorithm for Finding a Maximum Clique , 2010, WALCOM.

[11]  Atsuhiro Takasu,et al.  Exact algorithms for computing the tree edit distance between unordered trees , 2010, Theor. Comput. Sci..

[12]  Kaleem Siddiqi,et al.  Matching Hierarchical Structures Using Association Graphs , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[14]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[15]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[16]  William E. Higgins,et al.  System for the analysis and visualization of large 3D anatomical trees , 2007, Comput. Biol. Medicine.