论文信息 - Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems

Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems

This paper proposes a novel approach for a hybrid connectionist-hidden Markov model (HMM) speech recognition system based on the use of a neural network as vector quantizer. The neural network is trained with a new learning algorithm offering the following innovations. (1) It is an unsupervised learning algorithm for perceptron-like neural networks that are usually trained in the supervised mode. (2) Information theory principles are used as learning criteria, making the network especially suitable for combination with a HMM-based speech recognition system. (3) The neural network is not trained using the standard error-backpropagation algorithm but using instead a newly developed self-organizing learning approach. The use of the hybrid system with the neural vector quantizer results in a 25% error reduction compared with the same HMM system using a standard k-means vector quantizer. The training algorithm can be further refined by using a combination of unsupervised and supervised learning algorithms. Finally, it is demonstrated how the new learning approach can be applied to multiple-feature hybrid speech recognition systems, using a joint information theory-based optimization procedure for the multiple neural codebooks, resulting in a 30% error reduction. >

Gerhard Rigoll | G. Rigoll

[1] Hervé Bourlard,et al. Neural networks for statistical inference: Generalizations with applications to speech recognition , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2] Vishwa Gupta,et al. Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] G. Rigoll. A new unsupervised learning algorithm for multilayer perceptrons based on information theory principles , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Hsiao-Wuen Hon,et al. An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[6] D. Van Compernolle,et al. TDNN labeling for a HMM recognizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7] A. Nadas,et al. Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8] Shigeru Katagiri,et al. Speaker-independent large vocabulary word recognition using an LVQ/HMM hybrid algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9] M. A. Bush,et al. Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10] Samuel Kaski,et al. Using phoneme group specific LVQ-codebooks with HMMs , 1992, ICSLP.

[11] G. Rigoll. Unsupervised information theory-based training algorithms for multilayer neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.