论文信息 - Inducing Probabilistic Grammars by Bayesian Model Merging - 字舞流文

Inducing Probabilistic Grammars by Bayesian Model Merging

We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars.

Andreas Stolcke | Stephen M. Omohundro | S. Omohundro | A. Stolcke

[1] James Jay Horning,et al. A study of grammatical inference , 1969 .

[2] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3] D. Lindley,et al. Bayes Estimates for the Linear Model , 1972 .

[4] Taylor L. Booth,et al. Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[5] Azriel Rosenfeld,et al. Grammatical inference by hill climbing , 1976, Inf. Sci..

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .

[8] J. Baker. Trainable grammars for speech recognition , 1979 .

[9] Carl H. Smith,et al. Inductive Inference: Theory and Methods , 1983, CSUR.

[10] L. Rabiner,et al. An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[11] J. G. Wolff,et al. Cognitive development as optimisation , 1987 .

[12] S. Gull. Bayesian Inductive Inference and Maximum Entropy , 1988 .

[13] Yasubumi Sakakibara,et al. Learning context-free grammars from structural data in polynomial time , 1988, COLT '88.

[14] Ronald L. Rivest,et al. Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[15] Ian H. Witten,et al. Text Compression , 1990, 125 Problems in Text Algorithms.

[16] Stephen M. Omohundro,et al. Best-First Model Merging for Dynamic Learning and Recognition , 1991, NIPS.

[17] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[18] Wray L. Buntine,et al. Learning classification trees , 1992 .

[19] Frederick Jelinek,et al. Basic Methods of Probabilistic Context Free Grammars , 1992 .

[20] Dana Ron,et al. The Power of Amnesia , 1993, NIPS.

[21] Enrique Vidal,et al. Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[22] Andreas Stolcke,et al. Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[23] Andreas Stolcke,et al. Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[24] José Oncina,et al. Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[25] Andreas Stolcke,et al. Bayesian learning of probabilistic language models , 1994 .

[26] Pat Langley. Simplicity and Representation Change in Grammar Induction , 1995 .