Longest common extensions in trees

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries.In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree T of size n, the goal is to preprocess T into a compact data structure that support the following LCE queries between subpaths and subtrees in T. Let v 1 , v 2 , w 1 , and w 2 be nodes of T such that w 1 and w 2 are descendants of v 1 and v 2 respectively. LCE PP ( v 1 , w 1 , v 2 , w 2 ) : (path-path LCE) return the longest common prefix of the paths v 1 ź w 1 and v 2 ź w 2 . LCE PT ( v 1 , w 1 , v 2 ) : (path-tree LCE) return maximal path-path LCE of the path v 1 ź w 1 and any path from v 2 to a descendant leaf. LCE TT ( v 1 , v 2 ) : (tree-tree LCE) return a maximal path-path LCE of any pair of paths from v 1 and v 2 to descendant leaves. We present the first non-trivial bounds for supporting these queries. For LCE PP queries, we present a linear-space solution with O ( log * ź n ) query time. For LCE PT queries, we present a linear-space solution with O ( ( log ź log ź n ) 2 ) query time, and complement this with a lower bound showing that any path-tree LCE structure of size O ( n polylog ( n ) ) must necessarily use ź ( log ź log ź n ) time to answer queries. For LCE TT queries, we present a time-space trade-off, that given any parameter ź, 1 ź ź ź n , leads to an O ( n ź ) space and O ( n / ź ) query-time solution (all of these bounds hold on a standard unit-cost RAM model with logarithmic word size). This is complemented with a reduction from the set intersection problem implying that a fast linear space solution is not likely to exist.

[1]  Longest Common Extensions in Sublinear Space , 2015, CPM.

[2]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[3]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[4]  Uzi Vishkin,et al.  Finding Level-Ancestors in Trees , 1994, J. Comput. Syst. Sci..

[5]  S. Rao Kosaraju,et al.  Efficient tree pattern matching , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Rajeev Raman,et al.  Succinct ordinal trees with level-ancestor queries , 2004, SODA '04.

[7]  Stephen Alstrup,et al.  Nearest Common Ancestors: A Survey and a New Algorithm for a Distributed Environment , 2004, Theory of Computing Systems.

[8]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[9]  Volker Heun,et al.  Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE , 2006, CPM.

[10]  Michael A. Bender,et al.  The Level Ancestor Problem Simplified , 2002, LATIN.

[11]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[12]  Ely Porat,et al.  Fast set intersection and two-patterns matching , 2009, Theor. Comput. Sci..

[13]  Greg N. Frederickson Ambivalent Data Structures for Dynamic 2-Edge-Connectivity and k Smallest Spanning Trees , 1997, SIAM J. Comput..

[14]  Philip Bille,et al.  Time-Space Trade-Offs for Longest Common Extensions , 2012, CPM.

[15]  Peter van Emde Boas,et al.  Design and implementation of an efficient priority queue , 1976, Mathematical systems theory.

[16]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  Mikkel Thorup,et al.  Time-space trade-offs for predecessor search , 2006, STOC '06.

[18]  Mihai Patrascu,et al.  Distance Oracles beyond the Thorup-Zwick Bound , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[19]  Moshe Lewenstein,et al.  Faster algorithms for string matching with k mismatches , 2000, SODA '00.

[20]  Milan Ruzic,et al.  Uniform Algorithms for Deterministic Construction of Efficient Dictionaries , 2004, ESA.

[21]  Tetsuo Shibuya Constructing the Suffix Tree of a Tree with a Large Alphabet , 1999, ISAAC.

[22]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[23]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[24]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[25]  Volker Heun,et al.  Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays , 2011, SIAM J. Comput..

[26]  Paul F. Dietz Finding Level-Ancestors in Dynamic Trees , 1991, WADS.

[27]  Mikkel Thorup,et al.  Maintaining Center and Median in Dynamic Trees , 2000, SWAT.

[28]  Dany Breslauer The suffix Tree of a Tree and Minimizing Sequential Transducers , 1996, CPM.

[29]  Stephen Alstrup,et al.  Improved Algorithms for Finding Level Ancestors in Dynamic Trees , 2000, ICALP.

[30]  Philip Bille,et al.  Longest Common Extensions via Fingerprinting , 2012, LATA.

[31]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[32]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[33]  Hideo Bannai,et al.  Converting SLP to LZ78 in almost Linear Time , 2013, CPM.

[34]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[35]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[36]  Philip Bille,et al.  The tree inclusion problem: In linear space and faster , 2011, TALG.

[37]  Lucian Ilie,et al.  The longest common extension problem revisited and applications to approximate string searching , 2010, J. Discrete Algorithms.

[38]  Mikkel Thorup,et al.  Minimizing Diameters of Dynamic Trees , 1997, ICALP.