What HMMs Can Do

Since their inception almost fifty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems---today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial article analyzes HMMs by exploring a definition of HMMs in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM (say for ASR), rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[5]  Jeff A. Bilmes,et al.  WHAT HMMS CAN'T DO , 2004 .

[6]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[7]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[8]  J. Fritsch,et al.  ACID/HNN: a framework for hierarchical connectionist acoustic modeling , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[9]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[10]  T. E. Harris,et al.  On chains of infinite orde , 1955 .

[11]  Roger K. Moore,et al.  Simultaneous recognition of concurrent speech signals using hidden Markov model decomposition , 1991, EUROSPEECH.

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[14]  H. Derin,et al.  Discrete-index Markov-type random processes , 1989, Proc. IEEE.

[15]  Steven Greenberg,et al.  UNDERSTANDING SPEECH UNDERSTANDING: TOWARDS A UNIFIED THEORY OF SPEECH PERCEPTION , 1996 .

[16]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[17]  Jeff A. Bilmes,et al.  Joint distributional modeling with cross-correlation based features , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[18]  Shigeru Katagiri,et al.  A theoretical analysis of speech recognition based on feature trajectory models , 2004, INTERSPEECH.

[19]  Frederick Jelinek,et al.  Continuous speech recognition , 1977, SGAR.

[20]  J. Bilmes,et al.  Focused state transition information in ASR , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[21]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[22]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[23]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[24]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[25]  James R. Glass,et al.  Hidden feature models for speech recognition using dynamic Bayesian networks , 2003, INTERSPEECH.

[26]  Esther Levin,et al.  Word recognition using hidden control neural architecture , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[27]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[28]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[29]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[30]  Li Deng,et al.  A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition , 1995, Comput. Speech Lang..

[31]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[32]  Hideki Noda,et al.  A MRF-based parallel processing algorithm for speech recognition using linear predictive HMM , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[34]  David Heckerman,et al.  Dependency Networks for Density Estimation, Collaborative Filtering, and Data Visualization , 2000 .

[35]  Yochai Konig,et al.  Remap: recursive estimation and maximization of a posteriori probabilities in transition-based speech recognition , 1996 .

[36]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[37]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[38]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[39]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[40]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[41]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[42]  J. S. Bridle,et al.  An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition , 1998 .

[43]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[44]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[45]  Esther Levin Hidden control neural architecture modeling of nonlinear time varying systems and its applications , 1993, IEEE Trans. Neural Networks.

[46]  Beth Logan,et al.  Factorial HMMs for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[47]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[48]  Kuldip K. Paliwal,et al.  Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[49]  Jeff A. Bilmes,et al.  Hidden-articulator Markov models: performance improvements and robustness to noise , 2000, INTERSPEECH.

[50]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[51]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[52]  Seiichi Nakagawa,et al.  Comparative evaluation of segmental unit input HMM and conditional density HMM , 1995, EUROSPEECH.

[53]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[54]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[55]  V. Rich Personal communication , 1989, Nature.

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[58]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[59]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Mats Blomberg,et al.  Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system , 1982, ICASSP.

[61]  S. Furui On the role of spectral transition for speech perception. , 1986, The Journal of the Acoustical Society of America.

[62]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[63]  Stephen Cox Hidden Markov models for automatic speech recognition: theory and application , 1990 .

[64]  Jeff A. Bilmes,et al.  Hidden-articulator Markov models for speech recognition , 2003, Speech Commun..

[65]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[66]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[67]  Yair Weiss,et al.  Correctness of Local Probability Propagation in Graphical Models with Loops , 2000, Neural Computation.

[68]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[69]  Geoffrey Zweig,et al.  Dependency modeling with bayesian networks in a voicemail transcription system , 1999, EUROSPEECH.

[70]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[72]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[73]  D. Blackwell,et al.  On the Identifiability Problem for Functions of Finite Markov Chains , 1957 .

[74]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[75]  Illtyd Trethowan Causality , 1938 .

[76]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[77]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[78]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[79]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[80]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[81]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[82]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[83]  Mari Ostendorf,et al.  Continuous Word Recognition Based on the Stochastic Segment Model , 1992 .

[84]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[85]  Jeff A. Bilmes,et al.  Data-driven extensions to HMM statistical dependencies , 1998, ICSLP.

[86]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[87]  Philip C. Woodland,et al.  Hidden Markov models using vector linear prediction and discriminative output distributions , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[89]  Jeff A. Bilmes,et al.  Buried Markov models: a graphical-modeling approach to automatic speech recognition , 2003, Comput. Speech Lang..

[90]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[91]  David Williams,et al.  Probability with Martingales , 1991, Cambridge mathematical textbooks.

[92]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[93]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[94]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[95]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[96]  Mark J. F. Gales,et al.  Segmental hidden Markov models , 1993, EUROSPEECH.

[97]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[98]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[99]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[100]  Chin-Hui Lee,et al.  Improvements in connected digit recognition using higher order spectral and energy features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[101]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[102]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[103]  D. A. Bell,et al.  Information Theory and Reliable Communication , 1969 .

[104]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[105]  Michael I. Jordan Graphical Models , 2003 .

[106]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[107]  Jeff A. Bilmes,et al.  Maximum mutual information based reduction strategies for cross-correlation based joint distributional modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[108]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[109]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[110]  Philip C. Woodland,et al.  Optimising hidden Markov models using discriminative output distributions , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[111]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .

[112]  Geoffrey Zweig,et al.  Probabilistic modeling with Bayesian networks for automatic speech recognition , 1998, ICSLP.

[113]  Jeff A. Bilmes,et al.  Buried Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[114]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for speaker independent large vocabulary continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[115]  David G. Stork,et al.  Pattern Classification , 1973 .

[116]  M. W. Shields An Introduction to Automata Theory , 1988 .

[117]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[118]  Lawrence R. Rabiner,et al.  A minimum discrimination information approach for hidden Markov modeling , 1989, IEEE Trans. Inf. Theory.

[119]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[120]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[121]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[122]  Lawrence K. Saul,et al.  Markov Processes on Curves for Automatic Speech Recognition , 1998, NIPS.

[123]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[124]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[125]  M. Kadirkamanathan,et al.  Simultaneous model re-estimation from contaminated data by composed hidden Markov modeling , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[126]  Mari Ostendorf,et al.  Improvements in the Stochastic Segment Model for Phoneme Recognition , 1989, HLT.

[127]  Satoshi Takahashi,et al.  Phoneme HMMs constrained by frame correlations , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[128]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[129]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[130]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[131]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[132]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[133]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[134]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[135]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[136]  John Scott Bridle,et al.  Towards better understanding of the model implied by the use of dynamic features in HMMs , 2004, INTERSPEECH.