论文信息 - Fast String Matching in Stationary Ergodic Sources

Fast String Matching in Stationary Ergodic Sources

A connection is made between the theory of ergodicity and the expected complexity of string searching. In particular, a substring search algorithm is introduced which, when applied to searching in text that has been produced by an appropriate stationary ergodic source, has an expected running time of O((N/m + m)logm), for a text string of length N and search string of length m. Similar expected complexity results have been obtained before, but the analysis is performed in a significantly more general framework, which models with greater accuracy the statistics of many types of strings, including natural language. The analysis also sheds light on the performance of the Boyer-Moore algorithm and the Sunday algorithm when applied to natural language.

John Shawe-Taylor | J. Shawe-Taylor

[1] John Shawe-Taylor,et al. Fast string matching using an n‐gram algorithm , 1994, Softw. Pract. Exp..

[2] John Shawe-Taylor,et al. An Approximate String-Matching Algorithm , 1992, Theor. Comput. Sci..

[3] Daniel Sunday,et al. A very fast substring search algorithm , 1990, CACM.

[4] Ricardo A. Baeza-Yates,et al. String Searching Algorithms Revisited , 1989, WADS.

[5] Robert Schaback,et al. On the Expected Sublinearity of the Boyer-Moore Algorithm , 1988, SIAM J. Comput..

[6] Dominic J. A. Welsh,et al. Codes and cryptography , 1988 .

[7] R. Nigel Horspool,et al. Practical fast searching in strings , 1980, Softw. Pract. Exp..

[8] Andrew Chi-Chih Yao,et al. The Complexity of Pattern Matching for a Random String , 1977, SIAM J. Comput..

[9] Robert S. Boyer,et al. A fast string searching algorithm , 1977, CACM.

[10] Leonidas J. Guibas,et al. A new proof of the linearity of the Boyer-Moore string searching algorithm , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[11] Donald E. Knuth,et al. Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[12] P. Billingsley,et al. Ergodic theory and information , 1966 .

[13] A. Thomasian. An Elementary Proof of the AEP of Information Theory , 1960 .

[14] John Shawe-Taylor,et al. Fast Expected string Machine using an n-gram Algorithm , 1994 .

[15] G. de V. Smit,et al. A Comparison of Three String Matching Algorithms , 1982, Softw. Pract. Exp..