Robust speech recognition and feature extraction using HMM2

This paper presents the theoretical basis and preliminary experimental results of a new HMM model, referred to as HMM2, which can be considered as a mixture of HMMs. In this new model, the emission probabilities of the temporal (primary) HMM are estimated through secondary, state specific, HMMs working in the acoustic feature space. Thus, while the primary HMM is performing the usual time warping and integration, the secondary HMMs are responsible for extracting/modeling the possible feature dependencies, while performing frequency warping and integration. Such a model has several potential advantages, such as a more flexible modeling of the time/frequency structure of the speech signal. When working with spectral features, such a system can also perform nonlinear spectral warping, effectively implementing a form of nonlinear vocal tract normalization. Furthermore, it will be shown that HMM2 can be used to extract noise robust features, supposed to be related to formant regions, which can be used as extra features for traditional HMM recognizers to improve their performance. These issues are evaluated in the present paper, and different experimental results are reported on the Numbers95 database.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Wendy J. Holmes Segmental HMMs: Modelling dynamics and underlying structure for automatic speech recognition , 2000 .

[3]  Steve Young,et al.  The HTK book , 1995 .

[4]  Roberto Pieraccini,et al.  Planar Hidden Markov Modeling: From Speech to Optical Character Recognition , 1992, NIPS.

[5]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[6]  Roger K. Moore Computer Speech and Language , 1986 .

[7]  Hervé Bourlard,et al.  Speech recognition using advanced HMM2 features , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[8]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[9]  Samy Bengio,et al.  HMM2- a novel approach to HMM emission probability estimation , 2000, INTERSPEECH.

[10]  Gerhard Rigoll,et al.  High performance face recognition using pseudo 2-D hidden Markov models , 1999, 1999 European Control Conference (ECC).

[11]  Ronald A. Cole,et al.  New telephone speech corpora at CSLU , 1995, EUROSPEECH.

[12]  Hervé Glotin,et al.  Multi-stream adaptive evidence combination for noise robust ASR , 2001, Speech Commun..

[13]  Jeff A. Bilmes,et al.  Buried Markov models for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Samy Bengio,et al.  A Pragmatic View of the Application of HMM2 for ASR , 2001 .

[15]  Philip N. Garner,et al.  On the robust incorporation of formant features into hidden Markov models for automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Hermann Ney,et al.  Formant estimation for speech recognition , 1998, IEEE Trans. Speech Audio Process..

[17]  Oscar E. Agazzi,et al.  Machine vision for keyword spotting using pseudo 2D hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Vdi,et al.  European control conference ECC'99 , 1999 .

[19]  Samy Bengio,et al.  An EM Algorithm for HMMs with Emission Distributions Represented by HMMs , 2000 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Samy Bengio,et al.  IDIAP-RR 01-24 Speech Recognition Using Advanced HMM 2 Features , 1998 .

[22]  Johan Stephen Simeon Ballot Face recognition using Hidden Markov Models , 2005 .

[23]  Samy Bengio,et al.  Evaluation of formant-like features for ASR , 2002, INTERSPEECH.