A Boyer-Moore type algorithm for regular expression pattern matching

This paper presents a Boyer-Moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions-- a generalization of the Boyer-Moore and Commentz-Walter algorithms.Like the Boyer-Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walter's d1 and d2 shift functions.In certain cases, the Boyer-Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth-Morris-Pratt (respectively Aho-Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaraes (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280-294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho-Corasick algorithm.

[1]  B. Watson A taxonomy of finite automata minimization algorithms , 1993 .

[2]  Gerard Zwaan,et al.  A taxonomy of keyword pattern matching algorithms , 1992 .

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[5]  英哉 岩崎 20世紀の名著名論:D. E. Knuth J. H. Morris V. R. Pratt : Fast pattern matching in Strings , 2004 .

[6]  B. Watson A taxonomy of finite automata construction algorithms , 1993 .

[7]  Gerard Zwaan,et al.  A Taxonomy of Sublinear Multiple Keyword Pattern Matching Algorithms , 1996, Sci. Comput. Program..

[8]  G. H. Gonnet,et al.  Handbook of algorithms and data structures: in Pascal and C (2nd ed.) , 1991 .

[9]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[10]  Gerard Zwaan,et al.  A new taxonomy of sublinear keyword pattern matching algorithms , 2004 .

[11]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[12]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[13]  Alfred V. Aho,et al.  Pattern Matching in Strings , 1980 .

[14]  Gaston H. Gonnet,et al.  Handbook Of Algorithms And Data Structures , 1984 .

[15]  Andrew Hume,et al.  Fast string searching , 1991, USENIX Summer.

[16]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[17]  V AhoAlfred,et al.  Efficient string matching , 1975 .

[18]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[19]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..