Efficient normalization based upon GPD [generalized probabilistic descent]

We propose a simple but powerful method for normalizing various sources of mismatch between training and testing conditions in speech recognizers, based on a training methodology called the generalized probabilistic descent method (GPD). In this new framework, a gradient based method is used to adapt the parameters of the feature extraction process in order to minimize the distortion between new speech data and existing classifier models, while most conventional normalization/adaptation methods attempt to adapt classification parameters. The GPD was proposed as a general discriminative training method for pattern recognizers such as neural networks. Up until now this has been used only for classifier design, sometimes in combination with the design of a non adaptive feature extractor. This paper, in contrast, studies the adaptive training benefits of GPD in the framework of normalizing the feature extractor to a new pattern environment. Experiments which use this technique to improve Japanese vowel classification were conducted and demonstrate the ability to reduce error rates by as much as 40%.

[1]  Shigeru Katagiri,et al.  A generalized probabilistic descent method , 1990 .

[2]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[3]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Biing-Hwang Juang,et al.  Discriminative feature extraction for speech recognition , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[5]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Tetsuo Kosaka,et al.  Tree-structured speaker clustering for fast speaker adaptation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[8]  Bernie Mulgrew,et al.  IEEE Workshop on Neural Networks for Signal Processing , 1995 .

[9]  Alain Biem,et al.  A discriminative filter bank model for speech recognition , 1995, EUROSPEECH.

[10]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Mitch Weintraub,et al.  An experimental study of acoustic adaptation algorithms , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.