Discriminative training of dynamic programming based speech recognizers

A new minimum recognition error formulation and a generalized probabilistic descent (GPD) algorithm are analyzed and used to accomplish discriminative training of a conventional dynamic-programming-based speech recognizer. The objective of discriminative training here is to directly minimize the recognition error rate. To achieve this, a formulation that allows controlled approximation of the exact error rate and renders optimization possible is used. The GPD method is implemented in a dynamic-time-warping (DTW)-based system. A linear discriminant function on the DTW distortion sequence is used to replace the conventional average DTW path distance. A series of speaker-independent recognition experiments using the highly confusible English E-set as the vocabulary showed a recognition rate of 84.4% compared to approximately 60% for traditional template training via clustering. The experimental results verified that the algorithm converges to a solution that achieves minimum error rate. >

[1]  Chin-Hui Lee,et al.  Segmental GPD training of HMM based speech recognizer , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  Biing-Hwang Juang,et al.  Discriminative analysis of distortion sequences in speech recognition , 1993, IEEE Trans. Speech Audio Process..

[4]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[5]  Biing-Hwang Juang,et al.  Discriminative analysis of distortion sequences in speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  L. Rabiner,et al.  A modified K‐means clustering algorithm for use in speaker‐independent isolated word recognition , 1984 .

[7]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[8]  R. L. Kashyap,et al.  An Algorithm for Linear Inequalities and its Applications , 1965, IEEE Trans. Electron. Comput..

[9]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[11]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[12]  Shigeru Katagiri,et al.  Shift-invariant, multi-category phoneme recognition using Kohonen's LVQ2 , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[14]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[15]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.