Hybrid HMM / Neural Network based Speech Recognition in Loquendo ASR

This paper describes hybrid Hidden Markov Models / Artificial Neural Networks (HMM/ANN) models devoted to speech recognition, and in particular Loquendo HMM/ANN, that is the core of Loquendo ASR. While Hidden Markov Models (HMM) is a dominant approach in most state-of-the-art speaker-independent, continuous speech recognition systems (and commercial products), Artificial Neural Networks (ANN) are universally known as one the most powerful nonlinear methods for pattern recognition, time series prediction, optimization and forecasting. Hybrid HMM/ANN, introduced in the nineties for speech recognition, is presently a very competitive alternative to HMM, both in terms of performances and recognition accuracy. HMM/ANN combines the advantages of both approaches by using an ANN (a multilayer perceptron) to estimate the state dependent observation probabilities of a HMM, instead of Gaussian mixtures, while the temporal aspects of speech are dealt with by left-to-right HMM models. HMM/ANN can provide discriminative training, are capable of incorporating multiple input sources, and have a flexible architecture which can easily accommodate contextual inputs and feedbacks. Furthermore, ANN are typically highly parallel and regular structures, which makes them especially suited for high-performance architectures and optimized implementations.

[1]  Raymond L. Watrous,et al.  Connected recognition with a recurrent network , 1990, Speech Commun..

[2]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[3]  Richard P. Lippmann,et al.  Review of Neural Networks for Speech Recognition , 1989, Neural Computation.

[4]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .

[5]  Roberto Gemello,et al.  Word recognition with recurrent network automata , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[7]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[8]  Pietro Laface,et al.  Acoustic-phonetic modeling for flexible vocabulary speech recognition , 1995, EUROSPEECH.

[9]  D. Albesano,et al.  Speeding up neural network execution: an application to speech recognition , 1996, Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop.

[10]  Ciro Martins,et al.  Speaker-adaptation in a hybrid HMM-MLP recognizer , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Roberto Gemello,et al.  Continuous speech recognition with neural networks and stationary-transitional acoustic units , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[12]  Pietro Laface,et al.  Adaptation of Hybrid ANN/HMM Models Using Linear Hidden Transformations and Conservative Training , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.