论文信息 - Bayesian Grammar Induction for Language Modeling

Bayesian Grammar Induction for Language Modeling

We describe a corpus-based induction algorithm for probabilistic context-free grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a post-pass using the Inside-Outside algorithm. We compare the performance of our algorithm to n-gram models and the Inside-Outside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic context-free grammar and in both tasks our algorithm outperforms the other techniques. The third task involves naturally-occurring data, and in this task our algorithm does not perform as well as n-gram models but vastly outperforms the Inside-Outside algorithm.

Stanley F. Chen | Stanley F. Chen

[1] Azriel Rosenfeld,et al. Grammatical inference by hill climbing , 1976, Inf. Sci..

[2] H. Akaike. Prediction and Entropy , 1985 .

[3] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[4] L. Baum,et al. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[5] Yves Schabes,et al. Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[6] Philip Resnik,et al. Probabilistic Tree-Adjoining Grammar as a Framework for Statistical Natural Language Processing , 1992, COLING.

[7] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[8] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[9] Kenneth Ward Church,et al. A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[10] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[11] Lalit R. Bahl,et al. Recognition of continuously read natural corpus , 1978, ICASSP.