Efficient normalization based upon GPD [generalized probabilistic descent]

We propose a simple but powerful method for normalizing various sources of mismatch between training and testing conditions in speech recognizers, based on a training methodology called the generalized probabilistic descent method (GPD). In this new framework, a gradient based method is used to adapt the parameters of the feature extraction process in order to minimize the distortion between new speech data and existing classifier models, while most conventional normalization/adaptation methods attempt to adapt classification parameters. The GPD was proposed as a general discriminative training method for pattern recognizers such as neural networks. Up until now this has been used only for classifier design, sometimes in combination with the design of a non adaptive feature extractor. This paper, in contrast, studies the adaptive training benefits of GPD in the framework of normalizing the feature extractor to a new pattern environment. Experiments which use this technique to improve Japanese vowel classification were conducted and demonstrate the ability to reduce error rates by as much as 40%.

[1]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[2]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[3]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mitch Weintraub,et al.  An experimental study of acoustic adaptation algorithms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[6]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..