Hidden Markov Model and Neural Network Hybrid

When there is a mismatch between training and testing environments, statistical pattern classification methods may suffer from severe degradation in their performance because the parameters in the classifiers do not represent the testing data well. The mismatch is typically due to the interference or noises from operating environments. In this paper, a neural network based transformation approach is studied to handle the distribution mismatches between training and testing data. The probability density functions of the statistical classifiers are used as the objective function of the neural network. The neural network maximizes the likelihood of the data from a testing environment, and allows global optimization of the network when used with the statistical pattern classifiers. The proposed approach is applied to the area of automatic speech recognition to recognize noisy distant-talking speech and it reduces the error rate by 52.9%.

[1]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[2]  Alain Biem,et al.  Feature extraction based on minimum classification error/generalized probabilistic descent method , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  James L. Flanagan,et al.  Robust speech recognition using maximum likelihood neural networks and continuous density hidden Markov models , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Alex Waibel,et al.  Noise reduction using connectionist models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[6]  Qiguang Lin,et al.  Environment-independent continuous speech recognition using neural networks and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  James L. Flanagan,et al.  Adaptation to environment and speaker using maximum likelihood neural networks , 1999, EUROSPEECH.

[9]  Chin-Hui Lee,et al.  Simultaneous ANN feature and HMM recognizer design using string-based minimum classification error (MCE) training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Lin Cong,et al.  Robust speech recognition using neural networks and hidden Markov models , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[11]  James L. Flanagan,et al.  Telephone speech recognition using neural networks and hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  James L. Flanagan,et al.  Robust speech recognition using neural networks and hidden markov models: adaptations using nonlinear transformations , 1999 .

[13]  James L. Flanagan,et al.  N‐best breadth search for large vocabulary continuous speech recognition using a long span language model , 1998 .

[14]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..