WHAT HMMS CAN'T DO

Hidden Markov models (HMMs) are the predominant methodology for automatic speech recognition (ASR) systems. Ever since their inception, it has been said that HMMs are an inadequate statistical model for such purposes. Results over the years have shown, however, that HMM-based ASR performance continually improves given enough training data and engineering effort. In this paper, we argue that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. In search of a model to supersede the HMM for ASR, therefore, we should search for models with better parsimony, computational properties, noise insensitivity, and that better utilize high-level knowledge sources.

[1]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[2]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[5]  Jeff A. Bilmes,et al.  Graphical models and automatic speech recognition , 2002 .

[6]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[7]  Neri Merhav,et al.  Hidden Markov processes , 2002, IEEE Trans. Inf. Theory.

[8]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[9]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[11]  Jeff A. Bilmes,et al.  What HMMs Can Do , 2006, IEICE Trans. Inf. Syst..

[12]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[13]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[14]  S. P. Pederson,et al.  Hidden Markov and Other Models for Discrete-Valued Time Series , 1998 .

[15]  Michael I. Jordan Graphical Models , 1998 .

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[18]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[19]  Jeff A. Bilmes,et al.  Dynamic Bayesian Multinets , 2000, UAI.

[20]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[21]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[22]  Jeff A. Bilmes,et al.  Buried Markov models: a graphical-modeling approach to automatic speech recognition , 2003, Comput. Speech Lang..