Fast Algorithms for Extended Regular Expression Matching and Searching

Extended regular expressions are an extension of ordinary regular expressions by the operations of intersection and complement. We give new algorithms for extended regular expression matching and searching which improve significantly the (very old) best upper bound for this problem, due to Hopcroft and Ullman. For an extended regular expression of size m with p intersection and complement operators and an input word of length n our algorithms run in time O(mn2) and space O(pn2) while the one of Hopcroft and Ullman runs in time O(mn3) and space O(mn2). Since the matching problem for semiextended regular expressions (only intersection is added) has been very recently shown to be LOGCFL complete, our algorithms are very likely the best one can expect. We also emphasize the importance of the extended regular expressions for software programs currently using ordinary regular expressions and show how the algorithms presented can be improved to run significantly faster in practical applications.

[1]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[2]  Bell Telephone,et al.  Regular Expression Search Algorithm , 1968 .

[3]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.

[4]  Harry B. Hunt,et al.  The Equivalence Problem for Regular Expressions with Intersection is Not Polynomial in Tape , 1973 .

[5]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[6]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[7]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[8]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[9]  Sheng Yu Regular Languages , 1997, Handbook of Formal Languages.

[10]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[11]  Maxime Crochemore,et al.  Automata for Matching Patterns , 1997, Handbook of Formal Languages.

[12]  Hiroaki Yamamoto An Automata-Based Recognition Algorithm for Semi-extended Regular Expressions , 2000, MFCS.

[13]  Hiroaki Yamamoto,et al.  A New Recognition Algorithm for Extended Regular Expressions , 2001, ISAAC.

[14]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[15]  Holger Petersen The Membership Problem for Regular Expressions with Intersection Is Complete in LOGCFL , 2002, STACS.

[16]  Eugene W. Myers,et al.  Super-pattern matching , 1995, Algorithmica.