It has long been known that pattern matching in the Hamming distance metric can be done in O(min(|@S|,m/logm)nlogm) time, where n is the length of the text, m is the length of the pattern, and @S is the alphabet. The classic algorithm for this is due to Abrahamson and Kosaraju. This paper considers the following generalization, motivated by the situation where the entries in the text and pattern are analog, or distorted by additive noise, or imprecisely given for some other reason: in any alignment of the pattern with the text, two aligned symbols a and b contribute +1 to the similarity score if they differ by no more than a given threshold @q, otherwise they contribute zero. We give an O(min(|@S|,m/logm)nlogm) time algorithm for this more general version of the problem; the classic Hamming distance matching problem is the special case of @q=0.
[1]
Szymon Grabowski,et al.
Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time
,
2008,
Inf. Process. Lett..
[2]
Karl R. Abrahamson.
Generalized String Matching
,
1987,
SIAM J. Comput..
[3]
Z Galil,et al.
Improved string matching with k mismatches
,
1986,
SIGA.
[4]
Moshe Lewenstein,et al.
Faster algorithms for string matching with k mismatches
,
2000,
SODA '00.
[5]
M. Fischer,et al.
STRING-MATCHING AND OTHER PRODUCTS
,
1974
.
[6]
Richard M. Karp,et al.
Complexity of Computation
,
1974
.
[7]
Gaston H. Gonnet,et al.
A new approach to text searching
,
1992,
CACM.
[8]
V AhoAlfred,et al.
Efficient string matching
,
1975
.