Fast Algorithms for Computing Tree LCS

The LCS of two rooted, ordered, and labeled trees Fand Gis the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsityinherent to the tree LCS problem. Assuming Gis smaller than F, our first algorithm runs in time $O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$, where ris the number of pairs (vi?? F, wi?? G) such that vand whave the same label. Our second algorithm runs in time $O(L r \lg r \cdot \lg\lg|G|)$, where Lis the size of the LCS of Fand G. For this algorithm we present a novel three dimensional alignment graph. Our third algorithm is intended for the constrained variant of the problem in which only nodes with zero or one children can be deleted. For this case we obtain an $O(r h \lg \lg|G|)$ time algorithm, where h= height(F) + height(G).

[1]  Gad M. Landau,et al.  Locality and Gaps in RNA Comparison , 2007, J. Comput. Biol..

[2]  Hélène Touzet,et al.  A Linear Tree Edit Distance Algorithm for Similar Ordered Trees , 2005, CPM.

[3]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[4]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[5]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[6]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  Gad M. Landau,et al.  Local Alignment of RNA Sequences with Arbitrary Scoring Schemes , 2006, CPM.

[9]  Eugene W. Myers,et al.  Chaining multiple-alignment fragments in sub-quadratic time , 1995, SODA '95.

[10]  Philip N. Klein,et al.  A tree-edit-distance algorithm for comparing simple, closed shapes , 2000, SODA '00.

[11]  Francis Y. L. Chin,et al.  A fast algorithm for computing longest common subsequences of small alphabet size , 1989 .

[12]  Enno Ohlebusch,et al.  Chaining algorithms for multiple genome comparison , 2005, J. Discrete Algorithms.

[13]  Philip N. Klein,et al.  Computing the Edit-Distance between Unrooted Ordered Trees , 1998, ESA.

[14]  Claus Rick Simple and fast linear space computation of longest common subsequences , 2000, Inf. Process. Lett..

[15]  Amihood Amir,et al.  Generalized LCS , 2007, Theor. Comput. Sci..

[16]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[17]  Weimin Chen,et al.  New Algorithm for Ordered Tree-to-Tree Correction Problem , 2001, J. Algorithms.

[18]  Valiente Feruglio,et al.  On the maximum common embedded subtree problem for ordered trees , 2003 .

[19]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[20]  M. W. Du,et al.  New Algorithms for the LCS Problem , 1984, J. Comput. Syst. Sci..

[21]  Gad M. Landau,et al.  Normalized Similarity of RNA Sequences , 2005, SPIRE.

[22]  Kaizhong Zhang,et al.  Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..

[23]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[24]  Michal Ziv-Ukelson,et al.  Fast algorithms for computing tree LCS , 2009, Theor. Comput. Sci..

[25]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[26]  Philip Bille,et al.  Pattern Matching in Trees and Strings , 2007, ArXiv.

[27]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[28]  Alberto Apostolico,et al.  The longest common subsequence problem revisited , 1987, Algorithmica.

[29]  Gad M. Landau,et al.  A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices , 2003, SIAM J. Comput..

[30]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[31]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[32]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.