Graphical model architectures for speech recognition

This article discusses the foundations of the use of graphical models for speech recognition as presented in J. R. Deller et al. (1993), X. D. Huang et al. (2001), F. Jelinek (19970, L. R. Rabiner and B. -H. Juang (1993) and S. Young et al. (1990) giving detailed accounts of some of the more successful cases. Our discussion employs dynamic Bayesian networks (DBNs) and a DBN extension using the Graphical Model Toolkit's (GMTK's) basic template, a dynamic graphical model representation that is more suitable for speech and language systems. While this article concentrates on speech recognition, it should be noted that many of the ideas presented here are also applicable to natural language processing and general time-series analysis.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[3]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[4]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[5]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[8]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[11]  Jeff A. Bilmes,et al.  Stochastic perceptual speech models with durational dependence , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Li Deng,et al.  Transitional speech units and their representation by regressive Markov states: applications to speech recognition , 1996, IEEE Trans. Speech Audio Process..

[14]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..

[15]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[16]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[17]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[18]  Nikki Mirghafori,et al.  Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers , 1998, ICSLP.

[19]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[20]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[21]  Khalid Daoudi,et al.  A new approach for multi-band speech recognition based on probabilistic graphical models , 2000, INTERSPEECH.

[22]  Vladimir Pavlovic,et al.  Audio-visual speaker detection using dynamic Bayesian networks , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[23]  Gérard Chollet,et al.  A Markov random field based multi-band model , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[24]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[25]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[26]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[27]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[28]  P. Gehler,et al.  An introduction to graphical models , 2001 .

[29]  Khalid Daoudi,et al.  Structural learning of dynamic Bayesian networks in speech recognition , 2001, INTERSPEECH.

[30]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[31]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[33]  Nebojsa Jojic,et al.  Audio-Video Sensor Fusion with Probabilistic Graphical Models , 2002, ECCV.

[34]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[35]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Jeff A. Bilmes,et al.  Buried Markov models: a graphical-modeling approach to automatic speech recognition , 2003, Comput. Speech Lang..

[37]  Jeff A. Bilmes,et al.  Hidden-articulator Markov models for speech recognition , 2003, Speech Commun..

[38]  Jeff A. Bilmes,et al.  DBN based multi-stream models for speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[39]  James R. Glass,et al.  Hidden feature models for speech recognition using dynamic Bayesian networks , 2003, INTERSPEECH.

[40]  Jeff A. Bilmes,et al.  On Triangulating Dynamic Graphical Models , 2002, UAI.

[41]  Geoffrey Zweig,et al.  Bayesian network structures and inference techniques for automatic speech recognition , 2003, Comput. Speech Lang..

[42]  J. Bilmes,et al.  Elimination is Not Enough: Non-Minimal Triangulations for Graphical Models , 2004 .

[43]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .

[44]  James R. Glass,et al.  Feature-based pronunciation modeling with trainable asynchrony probabilities , 2004, INTERSPEECH.

[45]  Jeff A. Bilmes,et al.  DBN based multi-stream models for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Jeff A. Bilmes,et al.  PAC-learning Bounded Tree-width Graphical Models , 2004, UAI.

[47]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[48]  Mark Hasegawa-Johnson,et al.  Landmark-based speech recognition: report of the 2004 Johns Hopkins summer workshop , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..