Prototype-based MCE/GPD training for word spotting and connected word recognition

A straightforward application of PBMEC (prototype-based minimum error classifier) training to existing techniques for handling continuous speech is described. A novel MCE/GPD (minimum classification error/generalized probabilistic descent) loss function that can incorporate word spotting errors and other measures of symbolic distance between correct and incorrect categories is defined. Classification consists in a time-synchronous DTW (dynamic time warping) pass through a finite state machine; adaptation makes use of an A* based N-best algorithm and consists in propagating the derivative of the loss over the N best paths through the finite state machine. The key feature is that the loss function being optimized closely reflects the actual recognition performance of the system.<<ETX>>

[1]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[3]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[6]  S. Amari A Theory ofAdaptive Pattern Classifiers , 1967 .

[7]  Shigeru Katagiri,et al.  Application of a generalized probabilistic descent method to dynamic time warping-based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[9]  Steve Young,et al.  The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system , 1991 .

[10]  Shigeki Sagayama,et al.  Appropriate error criterion selection for continuous speech HMM minimum error training , 1992, ICSLP.

[11]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[12]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.