Longest Common Substring Made Fully Dynamic

In the longest common substring (LCS) problem, we are given two strings $S$ and $T$, each of length at most $n$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. This is a classical and well-studied problem in computer science with a known $\mathcal{O}(n)$-time solution. In the fully dynamic version of the problem, edit operations are allowed in either of the two strings, and we are asked to report an LCS after each such operation. We present the first solution to this problem that requires sublinear time per edit operation. In particular, we show how to return an LCS in $\tilde{\mathcal{O}}(n^{2/3})$ time (or $\tilde{\mathcal{O}}(\sqrt{n})$ time if edits are allowed in only one of the two strings) after each operation using $\tilde{\mathcal{O}}(n)$ space. This line of research was recently initiated by the authors [SPIRE 2017] in a somewhat restricted dynamic variant. An $\tilde{\mathcal{O}}(n)$-sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $\tilde{\mathcal{O}}(1)$ time was presented. At CPM 2018, three papers studied analogously restricted dynamic variants of problems on strings. We show that our techniques can be used to obtain fully dynamic algorithms for several classical problems on strings, namely, computing the longest repeat, the longest palindrome and the longest Lyndon substring of a string. The only previously known sublinear-time dynamic algorithms for problems on strings were obtained for maintaining a dynamic collection of strings for comparison queries and for pattern matching with the most recent advances made by Gawrychowski et al. [SODA 2018] and by Clifford et al. [STACS 2018].

[1]  Wojciech Rytter,et al.  Internal Pattern Matching Queries in a Text and Applications , 2013, SODA.

[2]  Hideo Bannai,et al.  Computing Palindromic Factorizations and Palindromic Covers On-line , 2014, CPM.

[3]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[4]  Burkhard Morgenstern,et al.  kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison , 2014, Bioinform..

[5]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[6]  Roberto Grossi,et al.  Optimal On-Line Search and Sublinear Time Update in String Matching , 1998, SIAM J. Comput..

[7]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[8]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[9]  Costas S. Iliopoulos,et al.  Longest Common Factor After One Edit Operation , 2017, SPIRE.

[10]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..

[11]  R. Lyndon,et al.  Free Differential Calculus, IV. The Quotient Groups of the Lower Central Series , 1958 .

[12]  Srinivas Aluru,et al.  Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis , 2018, RECOMB.

[13]  Travis Gagie,et al.  Heaviest Induced Ancestors and Longest Common Substrings , 2013, CCCG.

[14]  Uzi Vishkin,et al.  Efficient approximate and dynamic matching of patterns using a labeling paradigm , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[15]  27th Annual Symposium on Combinatorial Pattern Matching, CPM 2016, June 27-29, 2016, Tel Aviv, Israel , 2016, CPM.

[16]  Allan Grønlund Jørgensen,et al.  Upper and lower bounds for dynamic data structures on strings , 2018, STACS.

[17]  Hideo Bannai,et al.  Longest substring palindrome after edit , 2018, CPM.

[18]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[19]  Srinivas Aluru,et al.  A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem , 2016, J. Comput. Biol..

[20]  Hjalte Wedel Vildhøj,et al.  Sublinear Space Algorithms for the Longest Common Substring Problem , 2014, ESA.

[21]  Robert E. Tarjan,et al.  Unique Binary-Search-Tree Representations and Equality Testing of Sets and Sequences , 1994, SIAM J. Comput..

[22]  Arseny M. Shur,et al.  Palindromic Length in Linear Time , 2017, CPM.

[23]  Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China , 2018, CPM.

[24]  János Komlós,et al.  Storing a sparse table with O(1) worst case access time , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[25]  Tatiana Starikovskaya Longest Common Substring with Approximately k Mismatches , 2016, CPM.

[26]  Ming Gu,et al.  An efficient algorithm for dynamic text indexing , 1994, SODA '94.

[27]  Oren Weimann,et al.  Consequences of Faster Alignment of Sequences , 2014, ICALP.

[28]  Moshe Lewenstein,et al.  Range LCP Queries Revisited , 2015, SPIRE.

[29]  Szymon Grabowski A note on the longest common substring with k-mismatches problem , 2015, Inf. Process. Lett..

[30]  Hideo Bannai,et al.  Longest Lyndon Substring After Edit , 2018, CPM.

[31]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[32]  Maxime Crochemore,et al.  Fast parallel Lyndon factorization with applications , 1995, Mathematical systems theory.

[33]  Timothy M. Chan,et al.  Orthogonal range searching on the RAM, revisited , 2011, SoCG '11.

[34]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[35]  Marcin Mucha,et al.  Lyndon Words and Short Superstrings , 2012, SODA.

[36]  Paolo Ferragina Dynamic Text Indexing under String Updates , 1997, J. Algorithms.

[37]  Russell Impagliazzo,et al.  Which Problems Have Strongly Exponential Complexity? , 2001, J. Comput. Syst. Sci..

[38]  Maxim A. Babenko,et al.  Computing the longest common substring with one mismatch , 2011, Probl. Inf. Transm..

[39]  R. Lyndon On Burnside’s problem , 1954 .

[40]  Russell Impagliazzo,et al.  On the Complexity of k-SAT , 2001, J. Comput. Syst. Sci..

[41]  Pankaj K. Agarwal Range Searching , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[42]  Costas S. Iliopoulos,et al.  Parallel RAM Algorithms for Factorizing Words , 1994, Theor. Comput. Sci..

[43]  Hjalte Wedel Vildhøj,et al.  Time-Space Trade-Offs for the Longest Common Substring Problem , 2013, CPM.

[44]  Johannes Fischer,et al.  On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching , 2016, CPM.

[45]  Kurt Mehlhorn,et al.  Maintaining dynamic sequences under equality tests in polylogarithmic time , 1994, SODA '94.

[46]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[47]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[48]  Stephen Alstrup,et al.  New data structures for orthogonal range searching , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[49]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[50]  Piotr Sankowski,et al.  Optimal Dynamic Strings , 2015, SODA.

[51]  Hélène Barcelo,et al.  On the action of the symmetric group on the Free Lie Algebra and the partition lattice , 1990, J. Comb. Theory, Ser. A.

[52]  Gad M. Landau,et al.  Dynamic text and static pattern matching , 2007, TALG.

[53]  Maxime Crochemore,et al.  Longest repeats with a block of k don't cares , 2006, Theor. Comput. Sci..

[54]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[55]  Stephen Alstrup,et al.  Pattern matching in dynamic texts , 2000, SODA '00.

[56]  Huacheng Yu,et al.  More Applications of the Polynomial Method to Algorithm Design , 2015, SODA.

[57]  Juha Kärkkäinen,et al.  A subquadratic algorithm for minimum palindromic factorization , 2014, J. Discrete Algorithms.

[58]  Hideo Bannai,et al.  Faster Lyndon factorization algorithms for SLP and LZ78 compressed text , 2016, Theor. Comput. Sci..

[59]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .

[60]  Glenn K. Manacher,et al.  A New Linear-Time ``On-Line'' Algorithm for Finding the Smallest Initial Palindrome of a String , 1975, JACM.

[61]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[62]  Esko Ukkonen,et al.  Longest common substrings with k mismatches , 2014, Inf. Process. Lett..