论文信息 - The Noisy Substring Matching Problem

The Noisy Substring Matching Problem

Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.

B. John Oommen | Rangasami L. Kashyap | R. Kashyap | B. Oommen

[1] Godfrey Dewey,et al. Relativ frequency of English speech sounds , 1923 .

[2] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.

[3] Tamotsu Kasai,et al. A Method for the Correction of Garbled Words Based on the Levenshtein Metric , 1976, IEEE Transactions on Computers.

[4] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.

[5] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[6] Daniel S. Hirschberg,et al. Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[7] Rangasami L. Kashyap,et al. Syntactic Decision Rules for Recognition of Spoken Words and Phrases Using a Stochastic Automaton , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] B. John Oommen,et al. An effective algorithm for string correction using generalized edit distances--I. Description of the algorithm and its optimality , 1981, Inf. Sci..

[9] B. John Oommen,et al. An effective algorithm for string correction using generalized edit distance - II. Computational complexity of the algorithm and some applications , 1981, Inf. Sci..