论文信息 - Inference of regular languages using state merging algorithms with search

Inference of regular languages using state merging algorithms with search

State merging algorithms have emerged as the solution of choice for the problem of inferring regular grammars from labeled samples, a known NP-complete problem of great importance in the grammatical inference area. These methods derive a small deterministic finite automaton from a set of labeled strings (the training set), by merging parts of the acceptor that corresponds to this training set. Experimental and theoretical evidence have shown that the generalization ability exhibited by the resulting automata is highly correlated with the number of states in the final solution. As originally proposed, state merging algorithms do not perform search. This means that they are fast, but also means that they are limited by the quality of the heuristics they use to select the states to be merged. Sub-optimal choices lead to automata that have many more states than needed and exhibit poor generalization ability. In this work, we survey the existing approaches that generalize state merging algorithms by using search to explore the tree that represents the space of possible sequences of state mergings. By using heuristic guided search in this space, many possible state merging sequences can be considered, leading to smaller automata and improved generalization ability, at the expense of increased computation time. We present comparisons of existing algorithms that show that, in widely accepted benchmarks, the quality of the derived solutions is improved by applying this type of search. However, we also point out that existing algorithms are not powerful enough to solve the more complex instances of the problem, leaving open the possibility that better and more powerful approaches need to be designed.

Arlindo L. Oliveira | Miguel M. F. Bugalho | M. Bugalho

[1] Dana Angluin,et al. Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[2] Andreas Stolcke,et al. Bayesian learning of probabilistic language models , 1994 .

[3] Arlindo Manuel Olveira. Inductive learning by selection of minimal complexity representations , 1994 .

[4] Robert E. Schapire,et al. Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[5] Leonard Pitt,et al. The minimum consistent DFA problem cannot be approximated within any polynomial , 1993, JACM.

[6] David Haussler,et al. Occam's Razor , 1987, Inf. Process. Lett..

[7] Joao Marques-Silva,et al. Efficient Algorithms for the Inference of Minimum Size DFAs , 2001, Machine Learning.

[8] Jordan B. Pollack,et al. A Stochastic Search Approach to Grammar Induction , 1998, ICGI.

[9] Gerald J. Sussman,et al. Forward Reasoning and Dependency-Directed Backtracking in a System for Computer-Aided Circuit Analysis , 1976, Artif. Intell..

[10] Jacques Nicolas,et al. How Considering Incompatible State Mergings May Reduce the DFA Induction Search Tree , 1998, ICGI.

[11] E. Mark Gold,et al. System identification via state characterization , 1972 .