A hybrid RBF-HMM system for continuous speech recognition

A hybrid system for continuous speech recognition, consisting of a neural network with radial basis functions and hidden Markov models is described in this paper together with discriminant training techniques. Initially the neural net is trained to approximate a-posteriori probabilities of single HMM states. These probabilities are used by the Viterbi algorithm to calculate the total scores for the individual hybrid phoneme models. The final training of the hybrid system is based on the 'minimum classification error' objective function, which approximates the misclassification rate of the hybrid classifier, and the 'generalized probabilistic descent' algorithm. The hybrid system was used in continuous speech recognition experiments with phoneme units and shows about 63.8% phoneme recognition rate in a speaker-independent task.

[1]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[2]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[3]  Elliot Singer,et al.  A speech recognizer using radial basis function neural networks in an HMM framework , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Richard Lippmann,et al.  Hybrid neural-network/HMM approaches to wordspotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[6]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  S. Renals,et al.  Phoneme classification experiments using radial basis functions , 1989, International 1989 Joint Conference on Neural Networks.

[9]  Günther Ruske,et al.  Syllable segmentation of continuous speech with artificial neural networks , 1993, EUROSPEECH.

[10]  Günther Ruske,et al.  A new model-discriminant training algorithm for hybrid NN-HMM systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.