Improving State-of-the-Art Continuous Speech Recognition Systems Using the N-Best Paradigm with Neural Networks

In an effort to advance the state of the art in continuous speech recognition employing hidden Markov models (HMM), Segmental Neural Nets (SNN) were introduced recently to ameliorate the well-known limitations of HMMs, namely, the conditional-independence limitation and the relative difficulty with which HMMs can handle segmental features. We describe a hybrid SNN/HMM system that combines the speed and performance of our HMM system with the segmental modeling capabilities of SNNs. The integration of the two acoustic modeling techniques is achieved successfully via the N-best rescoring paradigm. The N-best lists are used not only for recognition, but also during training. This discriminative training using N-best is demonstrated to improve performance. When tested on the DARPA Resource Management speaker-independent corpus, the hybrid SNN/HMM system decreases the error by about 20% compared to the state-of-the-art HMM system.

[1]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  George Zavaliagkos,et al.  Continuous speech recognition using segmental neural nets , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[5]  Richard M. Schwartz,et al.  Toward a Real-Time Spoken Language System Using Commercial Hardware , 1990, HLT.

[6]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Amro El-Jaroudi,et al.  A new error criterion for posterior probability estimation with neural nets , 1990, 1990 IJCNN International Joint Conference on Neural Networks.