A Pragmatic View of the Application of HMM2 for ASR

This report investigates the HMM2 approach recently introduced in the framework of automatic speech recognition. HMM2 can be seen as a mixture of HMMs, where a conventional primary HMM (processing a time series of speech data) is supported on a lower level by a secondary HMM, working along the frequency dimension of a temporal segment of speech. The application of HMM2 to the speech signal is motivated by numerous potential advantages. However, speech recognition results did not show the expected performance improvements. In this paper, the HMM2 approach is pragmatically analyzed and evaluated on speech data, revealing some problems and suggesting potential solutions.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Gerhard Rigoll,et al.  High performance face recognition using pseudo 2-D hidden Markov models , 1999, 1999 European Control Conference (ECC).

[3]  Ronald A. Cole,et al.  New telephone speech corpora at CSLU , 1995, EUROSPEECH.

[4]  Hermann Ney,et al.  Formant estimation for speech recognition , 1998, IEEE Trans. Speech Audio Process..

[5]  Samy Bengio,et al.  HMM2- extraction of formant structures and their use for robust ASR , 2001, INTERSPEECH.

[6]  Samy Bengio,et al.  HMM2- a novel approach to HMM emission probability estimation , 2000, INTERSPEECH.

[7]  Samy Bengio,et al.  An EM Algorithm for HMMs with Emission Distributions Represented by HMMs , 2000 .

[8]  Samy Bengio,et al.  HMM2- Extraction of Formant Features and their Use for Robust ASR , 2001 .

[9]  Gary E. Kopec Formant tracking using hidden Markov models and vector quantization , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  Oscar E. Agazzi,et al.  Machine vision for keyword spotting using pseudo 2D hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Hervé Glotin,et al.  Multi-stream adaptive evidence combination for noise robust ASR , 2001, Speech Commun..

[12]  Philip N. Garner,et al.  On the robust incorporation of formant features into hidden Markov models for automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Steve Young,et al.  The HTK book , 1995 .