String Matching with Weighted Errors

Abstract In the approximate string matching problem, differences are allowed between the pattern string P and each of its occurrences in the text string T, and one is interested in finding all the occurrences of P in T with at most k differences. We consider here weighted differences (errors) between P and T and develop fast sequential and parallel algorithms. In particular, we allow the following types of errors: mismatch whose weight depends on the mismatching characters, extra character with constant weight, missing character with constant weight, and transposition of two consecutive characters with constant weight. A set of theoretical results allows to extend known algorithms to solve this problem with O(kn) sequential time and O(k + log m) parallel time on a 4PRAM model with max{n + k + 1, mp2} processors, where k is the maximum sum of the error weights, n is the length of T, and m is the length of P.

[1]  François Charot,et al.  Systolic architectures for connected speech recognition , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[3]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[4]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[5]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[6]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[7]  Gad M. Landau,et al.  Introducing efficient parallelism into approximate string matching and a new serial algorithm , 1986, STOC '86.

[8]  Raffaele Giancarlo,et al.  Data structures and algorithms for approximate string matching , 1988, J. Complex..

[9]  Raffaele Giancarlo,et al.  The Boyer-Moore-Galil String Searching Strategies Revisited , 1986, SIAM J. Comput..

[10]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[11]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[12]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[13]  B. John Oommen Recognition of Noisy Subsequences Using Constrained Edit Distances , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Esko Ukkonen,et al.  On Approximate String Matching , 1983, FCT.

[15]  Zvi Galil,et al.  Time-Space-Optimal String Matching , 1983, J. Comput. Syst. Sci..