Approximate Boyer-Moore String Matching

The Boyer–Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. The generalized Boyer–Moore algorithm is shown (under a mild independence assumption) to solve the problem in expected time $O(kn({1 / {(m - k) + ({k / c})}}))$, where c is the size of the alphabet. A related algorithm is developed for the k differences problem, where the task is to find all approximate occurrences of a pattern in a text with $ \leqslant k$ differences (insertions, deletions, changes). Experimental evaluation of the algorithms is reported, showing that the new algorithms are often significantly faster than the old ones. Both algorithms are functionally equivalent with the Horspool version of the Boyer–Moore algorithm when $k = 0$.

[1]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[2]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[3]  Z Galil,et al.  Improved string matching with k mismatches , 1986, SIGA.

[4]  V AhoAlfred,et al.  Efficient string matching , 1975 .

[5]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[6]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[7]  Raffaele Giancarlo,et al.  Data structures and algorithms for approximate string matching , 1988, J. Complex..

[8]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[9]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[10]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[11]  Fabrizio Luccio,et al.  Simple and Efficient String Matching with k Mismatches , 1989, Inf. Process. Lett..

[12]  Ricardo A. Baeza-Yates,et al.  String Searching Algorithms Revisited , 1989, WADS.

[13]  Gad M. Landau,et al.  Fast Parallel and Serial Approximate String Matching , 1989, J. Algorithms.

[14]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[15]  Ricardo Baeza-Yates,et al.  Efficient text searching , 1989 .

[16]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[17]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[18]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1989, SIAM J. Comput..