Bayesian Grammar Induction for Language Modeling

We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms the other techniques. The third task involves naturally-occurring data, and in this task our algorithm does not perform as well as n-gram models but vastly outperforms the Inside-Outside algorithm.

[1]  Azriel Rosenfeld,et al.  Grammatical inference by hill climbing , 1976, Inf. Sci..

[2]  H. Akaike Prediction and Entropy , 1985 .

[3]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[4]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[5]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[6]  Philip Resnik,et al.  Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[7]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[8]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[9]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[10]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[11]  Lalit R. Bahl,et al.  Recognition of continuously read natural corpus , 1978, ICASSP.

[12]  Rohini K. Srihari,et al.  Combining Statistical and Syntactic Methods in Recognizing Handwritten Sentences , 1992 .

[13]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[14]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[15]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[16]  James R. Glass,et al.  Empirical acquisition of word and phrase classes in the atis domain , 1993, EUROSPEECH.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[19]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[20]  J. Baker Trainable grammars for speech recognition , 1979 .

[21]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[23]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.