Structurally discriminative graphical models for automatic speech recognition - results from the 2001 Johns Hopkins Summer Workshop

In recent years there has been growing interest in discriminative parameter training techniques, resulting from notable improvements in speech recognition performance on tasks ranging in size from digit recognition to Switchboard. Typified by Maximum Mutual Information training, these methods assume a fixed statistical modeling structure, and then optimize only the associated numerical parameters (such as means, variances, and transition matrices). In this paper, we explore the significantly different methodology of discriminative structure learning. Here, the fundamental dependency relationships between random variables in a probabilistic model are learned in a discriminative fashion, and are learned separately from the numerical parameters. Tn order to apply the principles of structural discriminability, we adopt the framework of graphical models, which allows an arbitrary set of variables with arbitrary conditional independence relationships to be modeled at each time frame. We present results using a new graphical modeling toolkit (described in a companion paper) from the recent 2001 Johns Hopkins Summer Workshop. These results indicate that significant gains result from discriminative structural analysis of both conventional MFCC and novel AM-FM features on the Aurora continuous digits task.

[1]  A. Goldberger,et al.  Structural Equation Models in the Social Sciences. , 1974 .

[2]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[3]  K. Jöreskog A General Method for Estimating a Linear Structural Equation System. , 1970 .

[4]  Geoffrey Zweig,et al.  Probabilistic modeling with Bayesian networks for automatic speech recognition , 1998, ICSLP.

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[8]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[9]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[10]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Salvatore D. Morgera,et al.  An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[13]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[14]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[15]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[16]  Michael I. Jordan Graphical Models , 2003 .

[17]  Jeff A. Bilmes,et al.  Natural statistical models for automatic speech recognition , 1999 .

[18]  Jeff A. Bilmes,et al.  Buried Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[19]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[20]  A. Goldberger,et al.  Structural Equation Models in the Social Sciences. , 1974 .

[21]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[22]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.