Locating maximal approximate runs in a string

Abstract An exact run in a string T is a non-empty substring of T that is a repetition of a smaller substring possibly followed by a prefix of it. Finding maximal exact runs in strings is an important problem and therefore a well-studied one in the area of stringology. For a given string T of length n , finding all maximal exact runs in the string can be done in O ( n log ⁡ n ) time on general ordered alphabets or O ( n ) time on integer alphabets. In this paper, we investigate the maximal approximate runs problem: for a given string T and a number k , find non-empty substrings T ′ of T such that changing at most k letters in T ′ transforms them into a maximal exact run. We present an O ( n k 2 log 2 ⁡ k + o c c ) algorithm to solve this problem, where occ is the number of substrings found.

[1]  Hideo Bannai,et al.  Computing All Distinct Squares in Linear Time for Integer Alphabets , 2017, CPM.

[2]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[3]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[4]  Gregory Kucherov,et al.  Finding Approximate Repetitions under Hamming Distance , 2001, ESA.

[5]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[6]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[7]  Gang Chen,et al.  Lempel–Ziv Factorization Using Less Time & Space , 2008, Math. Comput. Sci..

[8]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[9]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[10]  Jens Stoye,et al.  Linear time algorithms for finding and representing all the tandem repeats in a string , 2004, J. Comput. Syst. Sci..

[11]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[12]  Pang Ko,et al.  Linear Time Construction of Suffix Arrays , 2002 .

[13]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[15]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[16]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[17]  Jeong Seop Sim,et al.  Approximate periods of strings , 2001, Theor. Comput. Sci..

[18]  Wojciech Rytter,et al.  New simple efficient algorithms computing powers and runs in strings , 2014, Discret. Appl. Math..

[19]  JOHANNES FISCHER,et al.  Beyond the Runs Theorem , 2015, SPIRE.

[20]  Srinivas Aluru,et al.  Space efficient linear time construction of suffix arrays , 2003, J. Discrete Algorithms.

[21]  Costas S. Iliopoulos,et al.  A Characterization of the Squares in a Fibonacci String , 1997, Theor. Comput. Sci..

[22]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[23]  M. Crochemore Recherche linéaire d'un carré dans un mot , 1983 .

[24]  J. Berstel,et al.  Context-free languages , 1993, SIGA.

[25]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[26]  Kazuya Tsuruta,et al.  The "Runs" Theorem , 2014, SIAM J. Comput..

[27]  Laurent Mouchard,et al.  Speeding up the detection of evolutive tandem repeats , 2004, Theor. Comput. Sci..

[28]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[29]  S. Rao Kosaraju,et al.  Computation of Squares in a String (Preliminary Version) , 1994, CPM.