Inference of regular languages using state merging algorithms with search

State merging algorithms have emerged as the solution of choice for the problem of inferring regular grammars from labeled samples, a known NP-complete problem of great importance in the grammatical inference area. These methods derive a small deterministic finite automaton from a set of labeled strings (the training set), by merging parts of the acceptor that corresponds to this training set. Experimental and theoretical evidence have shown that the generalization ability exhibited by the resulting automata is highly correlated with the number of states in the final solution. As originally proposed, state merging algorithms do not perform search. This means that they are fast, but also means that they are limited by the quality of the heuristics they use to select the states to be merged. Sub-optimal choices lead to automata that have many more states than needed and exhibit poor generalization ability. In this work, we survey the existing approaches that generalize state merging algorithms by using search to explore the tree that represents the space of possible sequences of state mergings. By using heuristic guided search in this space, many possible state merging sequences can be considered, leading to smaller automata and improved generalization ability, at the expense of increased computation time. We present comparisons of existing algorithms that show that, in widely accepted benchmarks, the quality of the derived solutions is improved by applying this type of search. However, we also point out that existing algorithms are not powerful enough to solve the more complex instances of the problem, leaving open the possibility that better and more powerful approaches need to be designed.

[1]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[2]  Andreas Stolcke,et al.  Bayesian learning of probabilistic language models , 1994 .

[3]  Arlindo Manuel Olveira Inductive learning by selection of minimal complexity representations , 1994 .

[4]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[5]  Leonard Pitt,et al.  The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[6]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[7]  Joao Marques-Silva,et al.  Efficient Algorithms for the Inference of Minimum Size DFAs , 2001, Machine Learning.

[8]  Jordan B. Pollack,et al.  A Stochastic Search Approach to Grammar Induction , 1998, ICGI.

[9]  Gerald J. Sussman,et al.  Forward Reasoning and Dependency-Directed Backtracking in a System for Computer-Aided Circuit Analysis , 1976, Artif. Intell..

[10]  Jacques Nicolas,et al.  How Considering Incompatible State Mergings May Reduce the DFA Induction Search Tree , 1998, ICGI.

[11]  E. Mark Gold,et al.  System identification via state characterization , 1972 .

[12]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[13]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[14]  DANA ANGLUIN,et al.  On the Complexity of Minimum Inference of Regular Sets , 1978, Inf. Control..

[15]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[16]  Alan W. Biermann,et al.  Constructing Programs from Example Computations , 1976, IEEE Transactions on Software Engineering.

[17]  Stephen Edwards,et al.  Limits of Exact Algorithms For Inference of Minimum Size Finite State Machines , 1996, ALT.

[18]  Jerome A. Feldman,et al.  Learning automata from ordered examples , 1988, COLT 1988.

[19]  Stefan C. Kremer,et al.  Beyond EDSM , 2002, ICGI.

[20]  Alan W. Biermann,et al.  Speeding up the Synthesis of Programs from Traces , 1975, IEEE Transactions on Computers.

[21]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[22]  Karem A. Sakallah,et al.  GRASP—a new search algorithm for satisfiability , 1996, ICCAD 1996.