Efficient string matching

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

[1]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[2]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[3]  Robert McNaughton,et al.  Regular Expressions and State Graphs for Automata , 1960, IRE Trans. Electron. Comput..

[4]  Janusz A. Brzozowski,et al.  Derivatives of Regular Expressions , 1964, JACM.

[5]  Taylor L. Booth,et al.  Sequential machines and automata theory , 1967 .

[6]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[7]  Douglas T. Ross,et al.  Automatic generation of efficient lexical processors using finite state techniques , 1968, CACM.

[8]  Malcolm C. Harrison,et al.  Implementation of the substring test by hashing , 1971, CACM.

[9]  R. H. Bullen,et al.  Microtext: the design of a microprogrammed finite state search machine for full-text retrieval , 1972, AFIPS '72 (Fall, part I).

[10]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[11]  James F. Gimpel A theory of discrete patterns and their implementation in SNOBOL4 , 1973, Commun. ACM.

[12]  Sorting and searching" the art of computer programming , 1973 .

[13]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[14]  M. Fischer,et al.  STRING-MATCHING AND OTHER PRODUCTS , 1974 .

[15]  Brian W. Kernighan,et al.  A system for typesetting mathematics , 1975, Commun. ACM.

[16]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..