Transformation streams and the HMM error model

The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations in its ability to model speech. A variety of modifications to the standard HMM topology have been proposed to handle these problems. One approach is the factorial HMM. This paper introduces a new form of factorial HMM which makes use of transformation streams. The new scheme is a generalization of the standard factorial HMM and other related schemes in speech processing. A particular form of this model, theHMM error model (HEM) is described in detail. The HEM is evaluated on two standard large vocabulary speaker independent speech recognition tasks. On both tasks significant reductions in word error rate are obtained over standard HMM-based systems.

[1]  Thomas Hain,et al.  THE CU-HTK MARCH 2000 HUB5E TRANSCRIPTION SYSTEM , 2000 .

[2]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[3]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[5]  Beth Logan,et al.  Factorial HMMs for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Mark J. F. Gales,et al.  Generalised linear Gaussian models , 2001 .

[7]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[8]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Vassilios Diakoloukas,et al.  Maximum-likelihood stochastic-transformation adaptation of hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[11]  Harriet J. Nock,et al.  Techniques for modelling Phonological Processes in Automatic Speech Recognition , 2001 .

[12]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[13]  Philip C. Woodland,et al.  The development of the 1994 HTK large vocabulary speech recognition system , 1995 .

[14]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[15]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[16]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[17]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[18]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[20]  George Zavaliagkos,et al.  Convolutional density estimation in hidden Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[21]  Daniel Povey,et al.  Large scale MMIE training for conversational telephone speech recognition , 2000 .

[22]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[23]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[24]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[25]  Richard E. Neapolitan,et al.  Probabilistic Modeling With Bayesian Networks , 2004 .

[26]  Mark J. F. Gales,et al.  Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[27]  Harriet J. Nock,et al.  Loosely coupled HMMs for ASR , 2000, INTERSPEECH.

[28]  Vassilios Digalakis,et al.  A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Mjf Gales Adapting semi-tied full-convariance matrix HMMs , 1997 .