Bayesian Belief Networks as a tool for stochastic parsing

Abstract Bayesian Belief Networks are a powerful tool for combining different knowledge sources with various degrees of uncertainty in a mathematical sound and computationally efficient way. Surprisingly they have not yet found their way into the speech processing field, despite the fact that in this science multiple unreliable information sources exist. The present paper shows how the theory can be utilized in for language modeling. After providing an introduction to the theory of Bayesian Networks, we develop several extensions to the classic theory by describing mechanisms for dealing with statistical dependence among daughter nodes (usually assumed to be conditionally independent) and by providing a learning algorithm based on the EM-algorithm with which the probabilities of link matrices can be learned from example data. Using these extensions a language model for speech recognition based on a context-free framework is constructed. In this model, sentences are not parsed in their entirety, as is usual with grammatical description, but only “locally” on suitably located segments. The model was evaluated over a text data base. In terms of test set entropy the model performed at least as good as the bi/tri-gram models, while showing a good ability to generalize from training to test data.

[1]  J. Gerard Wolfp,et al.  Language Acquisition and the Discovery of Phrase Structure , 1980 .

[2]  J. Gerard Wolff,et al.  Language acquisition, data compression and generalization , 1982 .

[3]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[4]  Hermann Ney,et al.  Dynamic programming parsing for context-free grammars in continuous speech recognition , 1991, IEEE Trans. Signal Process..

[5]  James C. Spohrer,et al.  Partial traceback and dynamic programming , 1982, ICASSP.

[6]  David J. Spiegelhalter,et al.  Sequential Model Criticism in Probabilistic Expert Systems , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[8]  Douglas B. Paul An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[9]  Douglas B. Paul,et al.  Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder* , 1991, HLT.

[10]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[11]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey - Part I , 1975, IEEE Trans. Syst. Man Cybern..

[12]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[13]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[14]  Helmut Lucke Reducing the computational complexity for inferring stochastic context-free grammar rules from example text , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  John R. Anderson,et al.  A Theory of Language Acquisition Based on General Learning Principles , 1981, IJCAI.

[16]  Robert C. Berwick,et al.  Computational Analogues of Constraints on Grammars: A Model of Syntactic Acquisition , 1980, ACL.

[17]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Taylor L. Booth,et al.  Grammatical Inference: Introduction and Survey-Part I , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[21]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[22]  J. Baker Trainable grammars for speech recognition , 1979 .