Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems

This paper proposes a novel approach for a hybrid connectionist-hidden Markov model (HMM) speech recognition system based on the use of a neural network as vector quantizer. The neural network is trained with a new learning algorithm offering the following innovations. (1) It is an unsupervised learning algorithm for perceptron-like neural networks that are usually trained in the supervised mode. (2) Information theory principles are used as learning criteria, making the network especially suitable for combination with a HMM-based speech recognition system. (3) The neural network is not trained using the standard error-backpropagation algorithm but using instead a newly developed self-organizing learning approach. The use of the hybrid system with the neural vector quantizer results in a 25% error reduction compared with the same HMM system using a standard k-means vector quantizer. The training algorithm can be further refined by using a combination of unsupervised and supervised learning algorithms. Finally, it is demonstrated how the new learning approach can be applied to multiple-feature hybrid speech recognition systems, using a joint information theory-based optimization procedure for the multiple neural codebooks, resulting in a 30% error reduction. >

[1]  Hervé Bourlard,et al.  Neural networks for statistical inference: Generalizations with applications to speech recognition , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[2]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  G. Rigoll A new unsupervised learning algorithm for multilayer perceptrons based on information theory principles , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[6]  D. Van Compernolle,et al.  TDNN labeling for a HMM recognizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  A. Nadas,et al.  Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  Shigeru Katagiri,et al.  Speaker-independent large vocabulary word recognition using an LVQ/HMM hybrid algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  M. A. Bush,et al.  Speaker-independent vowel classification using hidden Markov models and LVQ2 , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Samuel Kaski,et al.  Using phoneme group specific LVQ-codebooks with HMMs , 1992, ICSLP.

[11]  G. Rigoll Unsupervised information theory-based training algorithms for multilayer neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.