Fast string matching using an n‐gram algorithm

Experimental results are given for the application of a new n‐gram algorithm to substring searching in DNA strings. The results confirm theoretical predictions of expected running times based on the assumption that the data are drawn from a stationary ergodic source. They also confirm that the algorithms tested are the most efficient known for searches involving larger patterns.

[1]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[2]  Robert Schaback,et al.  On the Expected Sublinearity of the Boyer-Moore Algorithm , 1988, SIAM J. Comput..

[3]  John Shawe-Taylor,et al.  Fast String Matching in Stationary Ergodic Sources , 1996, Combinatorics, probability & computing.

[4]  Dominic J. A. Welsh,et al.  Codes and cryptography , 1988 .

[5]  Ching Y. Suen,et al.  n-Gram Statistics for Natural Language Understanding and Text Processing , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andrew Hume,et al.  Fast string searching , 1991, USENIX Summer.

[7]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[8]  Ricardo A. Baeza-Yates,et al.  Improved string searching , 1989, Softw. Pract. Exp..

[9]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[10]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[11]  John Shawe-Taylor,et al.  Fast Expected string Machine using an n-gram Algorithm , 1994 .

[12]  Daniel Sunday,et al.  A very fast substring search algorithm , 1990, CACM.