Prototype-based minimum classification error/generalized probabilistic descent training for various speech units

Abstract In previous work we reported high classification rates for learning vector quantization (LVQ) networks trained to classify phoneme tokens shifted in time. It has since been shown that the framework of minimum classification error (MCE) and generalized probabilistic descent (GPD) can treat LVQ as a special case of a general method for gradient descent on a rigorously defined classification loss measure that closely reflects the misclassification rate. This framework allows us to extend LVQ into a prototype-based minimum error classifier (PBMEC) appropriate for the classification of various speech units which the original LVQ was unable to treat. Speech categories are represented using a prototype-based multi-state architecture incorporating a dynamic time warping procedure. We present results for the difficult E-set task, as well as for isolated word recognition for a vocabulary of 5240 words, that reveal clear gains in performance as a result of using PBMEC. In addition, we discuss the issue of smoothing the loss function from the perspective of increasing classifier robustness.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[3]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[4]  Alex Waibel,et al.  Consonant recognition by modular construction of large phonemic time-delay neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[6]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[7]  E. Mcdermott,et al.  LVQ3 for phoneme recognition , 1990 .

[8]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  H. Sawai TDNN-LR continuous speech recognition system using adaptive incremental TDNN training , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Shigeru Katagiri,et al.  Speaker-independent large vocabulary word recognition using an LVQ/HMM hybrid algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Chin-Hui Lee,et al.  Robustness and discrimination oriented speech recognition using weighted HMM and subspace projection approaches , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[12]  Shigeki Sagayama,et al.  Appropriate error criterion selection for continuous speech HMM minimum error training , 1992, ICSLP.

[13]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Biing-Hwang Juang,et al.  Discriminative training of dynamic programming based speech recognizers , 1993, IEEE Trans. Speech Audio Process..

[17]  Bernie Mulgrew,et al.  IEEE Workshop on Neural Networks for Signal Processing , 1995 .