Linear-Time Algorithm for Long LCF with k Mismatches

In the Longest Common Factor with $k$ Mismatches (LCF$_k$) problem, we are given two strings $X$ and $Y$ of total length $n$, and we are asked to find a pair of maximal-length factors, one of $X$ and the other of $Y$, such that their Hamming distance is at most $k$. Thankachan et al. show that this problem can be solved in $\mathcal{O}(n \log^k n)$ time and $\mathcal{O}(n)$ space for constant $k$. We consider the LCF$_k$($\ell$) problem in which we assume that the sought factors have length at least $\ell$, and the LCF$_k$($\ell$) problem for $\ell=\Omega(\log^{2k+2} n)$, which we call the Long LCF$_k$ problem. We use difference covers to reduce the Long LCF$_k$ problem to a task involving $m=\mathcal{O}(n/\log^{k+1}n)$ synchronized factors. The latter can be solved in $\mathcal{O}(m \log^{k+1}m)$ time, which results in a linear-time algorithm for Long LCF$_k$. In general, our solution to LCF$_k$($\ell$) for arbitrary $\ell$ takes $\mathcal{O}(n + n \log^{k+1} n/\sqrt{\ell})$ time.

[1]  Jakub Radoszewski,et al.  Longest Common Substring with Approximately k Mismatches , 2019, Algorithmica.

[2]  Tatiana Starikovskaia Longest Common Substring with Approximately k Mismatches , 2016, CPM 2016.

[3]  Burkhard Morgenstern,et al.  kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison , 2014, Bioinform..

[4]  Szymon Grabowski A note on the longest common substring with k-mismatches problem , 2015, Inf. Process. Lett..

[5]  Richard Cole,et al.  Dictionary matching and indexing with errors and don't cares , 2004, STOC '04.

[6]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[7]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[8]  Russell Impagliazzo,et al.  Complexity of k-SAT , 1999, Proceedings. Fourteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat.No.99CB36317).

[9]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[10]  Esko Ukkonen,et al.  Longest common substrings with k mismatches , 2014, Inf. Process. Lett..

[11]  Yongchao Liu,et al.  ALFRED: A Practical Method for Alignment-Free Distance Computation , 2016, J. Comput. Biol..

[12]  Maxim A. Babenko,et al.  Computing the longest common substring with one mismatch , 2011, Probl. Inf. Transm..

[13]  Costas S. Iliopoulos,et al.  Longest Common Prefixes with k-Mismatches and Applications , 2018, SOFSEM.

[14]  Maxime Crochemore,et al.  Longest repeats with a block of k don't cares , 2006, Theor. Comput. Sci..

[15]  Mamoru Maekawa,et al.  A Square Root N Algorithm for Mutual Exclusion in Decentralized Systems , 1985, ACM Trans. Comput. Syst..

[16]  Lucas Chi Kwong Hui,et al.  Color Set Size Problem with Application to String Matching , 1992, CPM.

[17]  Robert E. Tarjan,et al.  A Fast Merging Algorithm , 1979, JACM.

[18]  Gerth Stølting Brodal,et al.  Finger Search Trees , 2004, Handbook of Data Structures and Applications.

[19]  Hjalte Wedel Vildhøj,et al.  Sublinear Space Algorithms for the Longest Common Substring Problem , 2014, ESA.

[20]  Kurt Mehlhorn,et al.  Sorting Jordan Sequences in Linear Time Using Level-Linked Search Trees , 1986, Inf. Control..

[21]  Srinivas Aluru,et al.  A Provably Efficient Algorithm for the k-Mismatch Average Common Substring Problem , 2016, J. Comput. Biol..

[22]  Juha Kärkkäinen,et al.  Fast Lightweight Suffix Array Construction and Checking , 2003, CPM.

[23]  Russell Impagliazzo,et al.  Which problems have strongly exponential complexity? , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[24]  Leonidas J. Guibas,et al.  A new representation for linear lists , 1977, STOC '77.

[25]  Robert E. Tarjan,et al.  Applications of Path Compression on Balanced Trees , 1979, JACM.

[26]  Costas S. Iliopoulos,et al.  Longest Common Prefixes with k-Errors and Applications , 2018, SPIRE.

[27]  Huacheng Yu,et al.  More Applications of the Polynomial Method to Algorithm Design , 2015, SODA.

[28]  Mamoru Maekawa,et al.  A N algorithm for mutual exclusion in decentralized systems , 1985, TOCS.

[29]  Hjalte Wedel Vildhøj,et al.  Time-Space Trade-Offs for the Longest Common Substring Problem , 2013, CPM.