Stochastic Context-Free Grammar Induction with a Genetic Algorithm Using Local Search

We have previously used grammars as a formalism to structure a GA's search for program called sorting networks (SNets) [KBW95]. In this paper we restrict ourselves to stochastic context-free grammars which, while more analytically tracxtable than our SNet grammars, are more difficult than others previously considered by the GA community. In our approach: production rules of a grammar are encoded as genes of a genome; this grammar is used as a recognizer of strings and assigned a fitness measure that reflects the probability that it captures the structure of a restricted sample of strings generated by a stochastic target language. Our GA introduces a novel encoding of grammars as genotypic strings, and uses a local search component to aid in learning rule probabilities. Both fitness evaluation and the local search algorithm depend on a sophisticated chart parser. We give results for two simple grammars whose nonstochastic equivalents have been used in a previous study. We also present arguments about the degree of testing needed for GAbased grammar induction.