Improving statistical speech recognition

A summary of the theory of the hybrid connectionist HMM (hidden Markov model) continuous speech recognition system is presented. Experimental results indicating that the connectionist methods can significantly improve the performance of a context-independent HMM system to a performance close to that of the state of the art context-dependent system of much higher complexity are given. Experimental results demonstrating that a state of the art context-dependent HMM system can be significantly improved by interpolating context-independent connectionist probability estimates are reported. The development of a principled network decomposition method that allows the efficient and parsimonious modeling of context-dependent phones with no independence assumptions, is reported.<<ETX>>

[1]  Steve Renals,et al.  Connectionist probability estimation in the DECIPHER speech recognition system , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mitch Weintraub,et al.  The decipher speech recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hervé Bourlard,et al.  CDNN: a context dependent neural network for continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[6]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Hynek Hermansky,et al.  Continuous speech recognition using PLP analysis with multilayer perceptrons , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.