An Improved Clique-Based Method for Computing Edit Distance between Rooted Unordered Trees

Tree structures are suitable for representing biological objects such as RNA secondary structures so that it is important in computational biology to compare tree structures. Though there are various metrics proposed for computing similarity between tree structured data, tree edit distance is one of the most widely used. However, it is known that the tree edit distance problem is NPhard for unordered trees. Fukagawa et al. have recently proposed a clique-based method for computing the tree edit distance between unordered trees in which each instance of the tree edit distance problem is transformed into an instance of the maximum vertex weighted clique problem and then an existing clique algorithm is applied. In this article, we propose an improved clique-based method for computing the tree edit distance between rooted unordered trees. Different from the previous method, we combine a dynamic programming approach with clique-based approach. Furthermore, we introduce heuristic techniques, which do not violate the optimality of the solution. Applied to comparison of large glycan structures, our improved method is much faster than the previous method in most cases of comparison of large glycan structures.

[1]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[2]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[3]  Atsuhiro Takasu,et al.  A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures , 2011, BMC Bioinformatics.

[4]  Tatsuya Akutsu,et al.  Efficient Algorithms for Finding Maximum and Maximal Cliques: Effective Tools for Bioinformatics , 2011 .

[5]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[6]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[7]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[8]  Atsuhiro Takasu,et al.  Exact algorithms for computing the tree edit distance between unordered trees , 2010, Theor. Comput. Sci..

[9]  William E. Higgins,et al.  System for the analysis and visualization of large 3D anatomical trees , 2007, Comput. Biol. Medicine.

[10]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[11]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[12]  Yair Horesh,et al.  Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees , 2006, J. Comput. Biol..

[13]  Etsuji Tomita,et al.  An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique , 2003, DMTCS.