A Sub-quadratic Algorithm for Approximate Limited Expression Matching 1

In this paper we present an efficient sub-quadratic time algorithm for matching strings and limited expressions in large texts. Limited expressions are a subset of regular expressions that appear often in practice. The generalization from simple strings to limited expressions has a negligible affect on the speed of our algorithm, yet allows much more flexibility. Our algorithm is similar in spirit to that of Masek and Paterson [MP80], but it is much faster in practice. Our experiments show a factor of 4 to 5 speedup against the algorithms of Sellers [Se80] and Ukkonen [Uk85] independent of the sizes of the input strings. Experiments also reveal our algorithm to be faster, in most cases, than a recent improvement by Chang and Lampe [CL92], especially for small alphabet sizes for which it is 2 to 3

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[3]  Uzi Vishkin,et al.  Fast String Matching with k Differences , 1988, J. Comput. Syst. Sci..

[4]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[5]  Zvi Galil,et al.  An Improved Algorithm for Approximate String Matching , 1990, SIAM J. Comput..

[6]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[7]  Peter H. Sellers,et al.  The Theory and Computation of Evolutionary Distances: Pattern Recognition , 1980, J. Algorithms.

[8]  Jordan Lampe,et al.  Theoretical and Empirical Comparisons of Approximate String Matching Algorithms , 1992, CPM.

[9]  Eugene W. Myers,et al.  A Four Russians algorithm for regular expression pattern matching , 1992, JACM.

[10]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[11]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[12]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[13]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[14]  Mike Paterson,et al.  A Faster Algorithm Computing String Edit Distances , 1980, J. Comput. Syst. Sci..

[15]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[16]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.