Beyond EDSM

In this paper, we analyze the effectiveness of a leading finite state automaton (FSA) induction algorithm, windowed evidence driven state merging (W-EDSM). W-EDSM generates small automata that correctly label a given set of positive and a given set of negative example strings defined by a regular (Type 3) language. In particular, W-EDSM builds a prefix tree for the exemplars which is then collapsed into a FSA. This is done by selecting nodes to merge based on a simple heuristic until no more merges are possible. Our experimental results show that the heuristic used works well for later merges, but not very well for early merges. Based on this observation, we are able to make a small modification to W-EDSM which improves the performance of the algorithm by 27% and suggest other avenues for futher enhancement.

[1]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[2]  Arlindo L. Oliveira,et al.  A new algorithm for the reduction of incompletely specified finite state machines , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[3]  Jordan B. Pollack,et al.  A Sampling-Based Heuristic for Tree Search Applied to Grammar Induction , 1998, AAAI/IAAI.

[4]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[5]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[6]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[7]  Kevin J. Lang Random DFA's can be approximately learned from sparse uniform examples , 1992, COLT '92.

[8]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[9]  Joao Marques-Silva,et al.  Efficient search techniques for the inference of minimum size finite automata , 1998, Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207).