Author's Personal Copy Theoretical Computer Science Approximation and Parameterized Algorithms for Common Subtrees and Edit Distance between Unordered Trees

Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known to be hard to approximate within some constant factor in general. We tackle these problems from two perspectives: giving exact algorithms, either for special cases or in terms of some parameters; and approximation algorithms and hardness of approximation. We present a parameterized algorithm in terms of the number of branching nodes that solves both problems and yields polynomial algorithms for several special classes of trees. This is complemented with a tighter APX-hardness proof that holds when the trees are of height one and two, respectively. Furthermore, we present the first approximation algorithms for both problems. In particular, for the common subtree problem for t trees, we present an algorithm achieving a tlog"2(b"O"P"T+1) ratio, where b"O"P"T is the number of branching nodes in the optimal solution. We also present constant factor approximation algorithms for both problems in the case of bounded height trees.

[1]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[2]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[3]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[4]  William E. Higgins,et al.  System for the analysis and visualization of large 3D anatomical trees , 2007, Comput. Biol. Medicine.

[5]  Atsuhiro Takasu,et al.  Improved approximation of the largest common subtree of two unordered trees of bounded height , 2008, Inf. Process. Lett..

[6]  Tatsuya Akutsu,et al.  On the approximation of largest common subtrees and largest common point sets , 1994, Theor. Comput. Sci..

[7]  Tiziana Catarci,et al.  Structure-aware XML Object Identification , 2006, IEEE Data Eng. Bull..

[8]  Atsuhiro Takasu,et al.  Constant Factor Approximation of Edit Distance of Bounded Height Unordered Trees , 2009, SPIRE.

[9]  Gabriel Valiente,et al.  An efficient bottom-up distance between trees , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[10]  Yair Horesh,et al.  Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees , 2006, J. Comput. Biol..

[11]  Erik D. Demaine,et al.  An Optimal Decomposition Algorithm for Tree Edit Distance , 2007, ICALP.

[12]  Erez Petrank The hardness of approximation: Gap location , 2005, computational complexity.

[13]  Marcel Turcotte,et al.  Algorithms in bioinformatics (CSI 5126) 1 , 2009 .

[14]  Kaizhong Zhang,et al.  Exact and approximate algorithms for unordered tree matching , 1994, IEEE Trans. Syst. Man Cybern..

[15]  Atsuhiro Takasu,et al.  Exact algorithms for computing the tree edit distance between unordered trees , 2010, Theor. Comput. Sci..

[16]  Keisuke Tanaka,et al.  Approximation and Special Cases of Common Subtrees and Editing Distance , 1996, ISAAC.

[17]  Kaizhong Zhang,et al.  A constrained edit distance between unordered labeled trees , 1996, Algorithmica.

[18]  Magnús M. Halldórsson,et al.  Journal of Graph Algorithms and Applications Approximations of Weighted Independent Set and Hereditary Subset Problems , 2022 .

[19]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[20]  Lusheng Wang,et al.  Alignment of trees: an alternative to tree edit , 1995 .

[21]  Atsuhiro Takasu,et al.  Approximating Tree Edit Distance through String Edit Distance , 2008, Algorithmica.

[22]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[23]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[24]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[25]  Fabrizio Grandoni,et al.  Resilient dictionaries , 2009, TALG.

[26]  Reuven Bar-Yehuda,et al.  Scheduling split intervals , 2002, SODA '02.

[27]  Tatsuya Akutsu,et al.  On the approximation of largest common subtrees and largest common point sets , 2000, Theor. Comput. Sci..

[28]  Kouichi Hirata,et al.  Improved MAX SNP-Hard Results for Finding an Edit Distance between Unordered Trees , 2011, CPM.

[29]  Atsuhiro Takasu,et al.  Efficient exponential-time algorithms for edit distance between unordered trees , 2014, J. Discrete Algorithms.

[30]  Ravindra K. Ahuja,et al.  Network Flows , 2011 .

[31]  Tao Jiang,et al.  Some MAX SNP-Hard Results Concerning Unordered Labeled Trees , 1994, Inf. Process. Lett..