Prototype-based minimum error training for speech recognition

A key concept in pattern recognition is that a pattern recognizer should be designed so as to minimize the errors it makes in classifying patterns. In this article, we review a recent, promising approach for minimizing the error rate of a classifier and describe a particular application to a simple, prototype-based speech recognizer. The key idea is to define a smooth, differentiable loss function that incorporates all adaptable classifier parameters and that approximates the actual performance error rate. Gradient descent can then be used to minimize this loss. This approach allows but does not require the use of explicitly probabilistic models. Furthermore, minimum error training does not involve the estimation of probability distributions that are difficult to obtain reliably. This new method has been applied to a variety of pattern recognition problems, with good results. Here we describe a particular application in which a relatively simple distance-based classifier is trained to minimize errors in speech recognition tasks. The loss function is defined so as to reflect errors at the level of the final, grammar-driven recognition output. Thus, minimization of this loss directly optimizes the overall system performance.

[1]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[2]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  David Rainton,et al.  Minimum error classification training of HMMs-Implementation details and experimental results.:Implementation details and experimental results , 1992 .

[4]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[5]  Shigeru Katagiri,et al.  Application of a generalized probabilistic descent method to dynamic time warping-based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Chin-Hui Lee,et al.  Robustness and discrimination oriented speech recognition using weighted HMM and subspace projection approaches , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[8]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Shigeki Sagayama,et al.  Minimum error classification training of HMMs , 1992 .

[10]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[11]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Shigeki Sagayama,et al.  Appropriate error criterion selection for continuous speech HMM minimum error training , 1992, ICSLP.

[13]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Steve Young,et al.  The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system , 1991 .

[15]  Biing-Hwang Juang,et al.  Discriminative multi-layer feed-forward networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[16]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.