Dynamic all scores matrices for LCS score

The problem of aligning two strings A,B in order to determine their similarity is fundamental in the field of pattern matching. An important concept in this domain is the "all scores matrix" that encodes the local alignment comparison of two strings. Namely, let K denote the all scores matrix containing the alignment score of every substring of B with A, and let J denote the all scores matrix containing the alignment score of every suffix of B with every prefix of A. In this paper we consider the problem of maintaining an all scores matrix where the scoring function is the LCS score, while supporting single character prepend and append operations to A and N. Our algorithms exploit the sparsity parameters L=LCS(A,B) and Delta = |B|-L. For the matrix K we propose an algorithm that supports incremental operations to both ends of A in O(Delta) time. Whilst for the matrix J we propose an algorithm that supports a single type of incremental operation, either a prepend operation to A or an append operation to B, in O(L) time. This structure can also be extended to support both operations simultaneously in O(L log log L) time.

[1]  Christian Wulff-Nilsen,et al.  Better Tradeoffs for Exact Distance Oracles in Planar Graphs , 2017, SODA.

[2]  Yoshifumi Sakai,et al.  A substring-substring LCS data structure , 2019, Theor. Comput. Sci..

[3]  Jeong Seop Sim,et al.  Implementing approximate regularities , 2005, Math. Comput. Model..

[4]  Marvin Künnemann,et al.  Multivariate Fine-Grained Complexity of Longest Common Subsequence , 2018, SODA.

[5]  Kuan-Yu Chen,et al.  Finding All Approximate Gapped Palindromes , 2009, ISAAC.

[6]  Christian Wulff-Nilsen,et al.  Fast and Compact Exact Distance Oracle for Planar Graphs , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[8]  Michal Ziv-Ukelson,et al.  On Almost Monge All Scores Matrices , 2018, Algorithmica.

[9]  Gad M. Landau,et al.  Sparse LCS Common Substring Alignment , 2003, CPM.

[10]  Shunsuke Inenaga,et al.  Compacting a Dynamic Edit Distance Table by RLE Compression , 2016, SOFSEM.

[11]  Fan Yang,et al.  Online pattern matching and prediction of incoming alarm floods , 2017 .

[12]  Gad M. Landau,et al.  On the Common Substring Alignment Problem , 2001, J. Algorithms.

[13]  Sergio Cabello,et al.  Many Distances in Planar Graphs , 2006, SODA '06.

[14]  Alexander Tiskin Semi-local longest common subsequences in subquadratic time , 2008, J. Discrete Algorithms.

[15]  Alexander Tiskin,et al.  Semi-local String Comparison: Algorithmic Techniques and Applications , 2007, Math. Comput. Sci..

[16]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[17]  David Eisenstat,et al.  Linear-time algorithms for max flow and multiple-source shortest paths in unit-weight planar graphs , 2013, STOC '13.

[18]  Alberto Apostolico,et al.  The longest common subsequence problem revisited , 1987, Algorithmica.

[19]  Gad M. Landau,et al.  Two Algorithms for LCS Consecutive Suffix Alignment , 2004, CPM.

[20]  Costas S. Iliopoulos,et al.  Generalized approximate regularities in strings , 2008, Int. J. Comput. Math..

[21]  Erin W. Chambers,et al.  Multiple-Source Shortest Paths in Embedded Graphs , 2012, SIAM J. Comput..

[22]  Michal Ziv-Ukelson,et al.  Efficient All Path Score Computations on Grid Graphs , 2013, CPM.

[23]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 2001, J. Comput. Biol..

[24]  Yijie Han Deterministic sorting in O(nlog log n) time and linear space , 2002, STOC '02.

[25]  Dina Sokol,et al.  Speeding up the detection of tandem repeats over the edit distance , 2014, Theor. Comput. Sci..

[26]  Edson Cáceres,et al.  An all-substrings common subsequence algorithm , 2008, Discret. Appl. Math..

[27]  Sung-Ryul Kim,et al.  A Dynamic Edit Distance Table , 2000, CPM.

[28]  Mikhail J. Atallah,et al.  Efficient Parallel Algorithms for String Editing and Related Problems , 1990, SIAM J. Comput..

[29]  Ayumi Shinohara,et al.  Fully Incremental LCS Computation , 2005, FCT.

[30]  Christian Sommer,et al.  Exact distance oracles for planar graphs , 2010, SODA.

[31]  Gad M. Landau,et al.  On the Complexity of Sparse Exon Assembly , 2006, J. Comput. Biol..

[32]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[33]  Heikki Hyyrö An Efficient Linear Space Algorithm for Consecutive Suffix Alignment under Edit Distance (Short Preliminary Paper) , 2008, SPIRE.

[34]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[35]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[36]  Yoshifumi Sakai An Almost Quadratic Time Algorithm for Sparse Spliced Alignment , 2009, Theory of Computing Systems.

[37]  Shunsuke Inenaga,et al.  Dynamic Edit Distance Table under a General Weighted Cost Function , 2010, SOFSEM.