Provably Shorter Regular Expressions from Deterministic Finite Automata

We study the problem of finding good elimination orderings for the state elimination algorithm, which is one of the most popular algorithms for the conversion of finite automata into equivalent regular expressions. Based on graph separator techniques we are able to describe elimination strategies that remove states in large induced subgraphs that are "simple" like, e.g., independent sets or subgraphs of bounded treewidth, of the underlying automaton, that lead to regular expressions of moderate size. In particular, we show that there is an elimination ordering such that every language over a binary alphabet accepted by an n-state deterministicfinite automaton has alphabetic width at most O(1.742n), which is, to our knowledge, the algorithm with currently the best known performance guarantee. Finally, we apply our technique to the question on the effect of language operations on regular expression size. In case of the intersection operation we prove an upper bound which matches, up to a small factor, a lower bound recently obtained in [9,10], and thus settles an open problem stated in [7].

[1]  Derick Wood,et al.  Obtaining shorter regular expressions from finite-state automata , 2007, Theor. Comput. Sci..

[2]  Nelma Moreira,et al.  Acyclic Automata with Easy-to-Find Short Regular Expressions , 2005, CIAA.

[3]  B. A. Reed,et al.  Algorithmic Aspects of Tree Width , 2003 .

[4]  P. Erdös On an extremal problem in graph theory , 1970 .

[5]  Derick Wood,et al.  Theory of computation , 1986 .

[6]  Jeffrey Shallit,et al.  Regular Expressions: New Results and Open Problems , 2004, J. Autom. Lang. Comb..

[7]  Ian Stark,et al.  Free-Algebra Models for the pi-Calculus , 2005, FoSSaCS.

[8]  Janusz A. Brzozowski,et al.  Derivatives of Regular Expressions , 1964, JACM.

[9]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[10]  Michael Domaratzki,et al.  State Complexity of Proportional Removals , 2002, J. Autom. Lang. Comb..

[11]  Borivoj Melichar,et al.  Finding Common Motifs with Gaps Using Finite Automata , 2006, CIAA.

[12]  Harold V. McIntosh REEX: A CONVERT Program to Realize the McNaughton-Yamada Analysis Algorithm , 1968 .

[13]  R. Tarjan,et al.  A Separator Theorem for Planar Graphs , 1977 .

[14]  Andrzej Ehrenfeucht,et al.  Complexity measures for regular expressions , 1974, STOC '74.

[15]  M. W. Shields An Introduction to Automata Theory , 1988 .

[16]  Markus Holzer,et al.  Finite Automata, Digraph Connectivity, and Regular Expression Size , 2008, ICALP.

[17]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[18]  Robert E. Filman,et al.  GOTO removal based on regular expressions , 1997 .

[19]  Lucian Ilie,et al.  Follow automata , 2003, Inf. Comput..

[20]  Karl-Heinz Schmelovsky Probleme der Bayesschen Schätzung bei zeitdiskreter Beobachtung , 1974, J. Inf. Process. Cybern..

[21]  B. Mohar,et al.  Graph Minors , 2009 .

[22]  Jacques Sakarovitch,et al.  The Language, the Expression, and the (Small) Automaton , 2005, CIAA.

[23]  S C Kleene,et al.  Representation of Events in Nerve Nets and Finite Automata , 1951 .

[24]  G. Schnitger Regular expressions and NFAs without ε-transitions , 2006 .

[25]  Graham Farr,et al.  Planarization and fragmentability of some classes of graphs , 2008, Discret. Math..

[26]  Manuel Delgado,et al.  Approximation to the Smallest Regular Expression for a Given Regular Language , 2004, CIAA.

[27]  Markus Holzer,et al.  Language Operations with Regular Expressions of Polynomial Size , 2008, DCFS.

[28]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[29]  Henning Fernau,et al.  Local elimination-strategies in automata for shorter regular expressions , 2008, SOFSEM.

[30]  Jan Johannsen,et al.  Optimal Lower Bounds on Regular Expression Size Using Communication Complexity , 2008, FoSSaCS.

[31]  Robert McNaughton,et al.  Regular Expressions and State Graphs for Automata , 1960, IRE Trans. Electron. Comput..

[32]  Jaikumar Radhakrishnan,et al.  Greed is good: Approximating independent sets in sparse and bounded-degree graphs , 1997, Algorithmica.

[33]  Wouter Gelade Succinctness of regular expressions with interleaving, intersection and counting , 2010, Theor. Comput. Sci..

[34]  Albert R. Meyer,et al.  Word problems requiring exponential time(Preliminary Report) , 1973, STOC.