Fast String Matching with k Differences

Abstract Consider the string matching problem where differences between characters of the pattern and characters of the text are allowed. Each difference is due to either a mismatch between a character of the text and a character of the pattern or a superfluous character in the text or a superfluous character in the pattern. Given a text of length n , a pattern of length m , and an integer k , we present an algorithm for finding all occurrences of the pattern in the text, each with at most k differences. It runs in O ( m + nk 2 ) time for an alphabet whose size is fixed. For general input the algorithm requires O ( m log m + nk 2 ) time. In both cases the space requirement is O ( m ).

[1]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[2]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..

[3]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[4]  Zvi Galil,et al.  Open Problems in Stringology , 1985 .

[5]  Gad M. Landau,et al.  An efficient string matching algorithm with k differences for nucleotide and amino acid sequences , 2018, Nucleic Acids Res..

[6]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[7]  Gad M. Landau,et al.  Efficient string matching in the presence of errors , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[8]  Uzi Vishkin,et al.  Optimal Parallel Pattern Matching in Strings (Extended Summary) , 1985, ICALP.

[9]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[10]  A G Ivanov RECOGNITION OF AN APPROXIMATE OCCURRENCE OF WORDS ON A TURING MACHINE IN REAL TIME , 1985 .

[11]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[12]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[13]  Zvi Galil Optimal Parallel Algorithms for String Matching , 1985, Inf. Control..

[14]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[15]  Esko Ukkonen,et al.  On Approximate String Matching , 1983, FCT.

[16]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[17]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[18]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..