Learning SCFGs from Corpora by a Genetic Algorithm

A genetic algorithm for inferring stochastic context-free grammars from finite language samples is described. Solutions to the inference problem are found by optimizing the parameters of a covering grammar for a given language sample. We describe a number of experiments in learning grammars for a range of formal languages. The results of these experiments are encouraging and compare very favourably with other approaches to stochastic grammatical inference.

[1]  Frederick E. Petry,et al.  Regular language induction with genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[2]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[3]  J. Baker Trainable grammars for speech recognition , 1979 .

[4]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[5]  Markus Schwehm,et al.  Inference of Stochastic Regular Grammars by Massively Parallel Genetic Algorithms , 1995, ICGA.

[6]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[7]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[8]  Marc M. Lankhorst Grammatical Inference with a Genetic Algorithm , 1994, EUROSIM.

[9]  Sandip Sen,et al.  Learning to construct pushdown automata for accepting deterministic context-free languages , 1992, Defense, Security, and Sensing.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  Peter J. Wyard Context Free Grammar Induction Using Genetic Algorithms , 1991, ICGA.