Faster algorithms for string matching with k mismatches

The string matching with mismatches problem is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T. Currently, the fastest algorithms for this problem are the following. The Galil-Giancarlo algorithm finds all locations where the pattern has at most k errors (where k is part of the input) in time O(nk). The Abrahamson algorithm finds the number of mismatches at every location in time O(n√ m log m). We present an algorithm that is faster than both. Our algorithm finds all locations where the pattern has at most k errors in time O(n√k log k). We also show an algorithm that solves the above problem in time O((n + (nk3)/m) log k).

[1]  Howard J. Karloff Fast Algorithms for Approximately Counting Mismatches , 1993, Inf. Process. Lett..

[2]  Robert A. Wagner,et al.  On the complexity of the Extended String-to-String Correction Problem , 1975, STOC.

[3]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[4]  Karl R. Abrahamson Generalized String Matching , 1987, SIAM J. Comput..

[5]  Richard Cole,et al.  Approximate string matching: a simpler faster algorithm , 2002, SODA '98.

[6]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..

[7]  Uzi Vishkin,et al.  Deterministic sampling—a new technique for fast pattern matching , 1990, STOC '90.

[8]  Gad M. Landau,et al.  Efficient string matching in the presence of errors , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[9]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[10]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[11]  V AhoAlfred,et al.  Efficient string matching , 1975 .

[12]  Uzi Vishkin,et al.  Highly parallelizable problems , 1989, STOC '89.

[13]  Gad M. Landau,et al.  Incremental String Comparison , 1998, SIAM J. Comput..

[14]  Gad M. Landau,et al.  Efficient Special Cases of Pattern Matching with Swaps , 1998, Inf. Process. Lett..

[15]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[16]  Gad M. Landau,et al.  Efficient String Matching with k Mismatches , 2018, Theor. Comput. Sci..

[17]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[18]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[19]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[20]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[21]  M V Olson,et al.  A Time to Sequence , 1995, Science.

[22]  Robert A. Wagner,et al.  An Extension of the String-to-String Correction Problem , 1975, JACM.

[23]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[24]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[25]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[26]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[27]  Moshe Lewenstein,et al.  Real scaled matching , 2000, SODA '00.

[28]  Uzi Vishkin,et al.  Deterministic Sampling - A New Technique for Fast Pattern Matching , 1991, SIAM J. Comput..

[29]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .