Approximate Labelled Subtree Homeomorphism

Given two undirected trees T and P, the Subtree Homeomorphism Problem is to find whether T has a subtree t that can be transformed into P by removing entire subtrees, as well as repeatedly removing a degree-2 node and adding the edge joining its two neighbors. In this paper we extend the Subtree Homeomorphism Problem to a new optimization problem by enriching the subtree-comparison with node-to-node similarity scores. The new problem, denoted ALSH (Approximate Labelled Subtree Homeomorphism) is to compute the homeomorphic subtree of T which also maximizes the overall node-to-node resemblance. We describe an O(m 2 n/ log m + mn log n) algorithm for solving ALSH on unordered, unrooted trees, where m and n are the number of vertices in P and T, respectively. We also give an O(mn) algorithm for rooted ordered trees.

[1]  Felix Naumann,et al.  Approximate tree embedding for querying XML data , 2000 .

[2]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1987, JACM.

[3]  Steven W. Reyner,et al.  An Analysis of a Good Algorithm for the Subtree Problem , 1977, SIAM J. Comput..

[4]  Ravindra K. Ahuja,et al.  New scaling algorithms for the assignment and minimum mean cycle problems , 1992, Math. Program..

[5]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[6]  Ron Shamir,et al.  Faster Subtree Isomorphism , 1999, J. Algorithms.

[7]  Ming-Yang Kao,et al.  Cavity Matchings, Label Compressions, and Unrooted Evolutionary Trees , 2000, SIAM J. Comput..

[8]  Gary L. Miller,et al.  Subtree isomorphism is in random NC , 1988, Discret. Appl. Math..

[9]  Gabriel Valiente Constrained Tree Inclusion , 2003, CPM.

[10]  Rajeev Motwani,et al.  Clique partitions, graph compression and speeding-up algorithms , 1991, STOC '91.

[11]  Andrew V. Goldberg,et al.  Sublinear-Time Parallel Algorithms for Matching and Related Problems , 1993, J. Algorithms.

[12]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[13]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[14]  Moon-Jung Chung,et al.  O(n^(2.55)) Time Algorithms for the Subgraph Homeomorphism Problem on Trees , 1987, J. Algorithms.

[15]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[16]  M. Kanehisa,et al.  A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters. , 2000, Nucleic acids research.

[17]  George B. Dantzig,et al.  7* A Primal-Dual Algorithm for Linear Programs , 1957 .

[18]  Ron Y. Pinter,et al.  Alignment of metabolic pathways , 2005, Bioinform..

[19]  Falk Schreiber,et al.  Comparison of Metabolic Pathways using Constraint Graph Drawing , 2003, APBC.

[20]  D. Matula Subtree Isomorphism in O(n5/2) , 1978 .

[21]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[22]  Heikki Mannila,et al.  Ordered and Unordered Tree Inclusion , 1995, SIAM J. Comput..

[23]  Ravindra K. Ahuja,et al.  New scaling algorithms for the assignment and minimum cycle mean problems , 1988 .

[24]  Tandy J. Warnow,et al.  Kaikoura Tree Theorems: Computing the Maximum Agreement Subtree , 1993, Inf. Process. Lett..

[25]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[26]  Marek Karpinski,et al.  Subtree Isomorphism is NC Reducible to Bipartite Perfect Matching , 1989, Inf. Process. Lett..

[27]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[28]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.