A posteriori and a priori transformations for speaker adaptation in large vocabulary speech recognition systems

The speaker-dependent HMM-based recognizers gives lower word error rates in comparison with the corresponding speaker-independent recognizers. The aim of speaker adaptation techniques is to enhance the speakerindependent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. In this paper, we propose a method using test and training data for acoustic model adaptation. This method operates in two steps. The first one performs an a priori adaptation using the transcribed training data of the closest training speakers to the test speaker. This adaptation is done with MAP procedure allowing reduced variances in the acoustic models. The second one performs an a posteriori adaptation using the MLLR procedure on the test data, allowing mapping of Gaussians means to match the test speaker’s acoustic space. This adaptation strategy was evaluated in a large vocabulary speech recognition task. Our method leads to a relative gain of 15% with respect to the baseline system and 10% with respect to the conventional MLLR adaptation.

[1]  Yunxin Zhao,et al.  An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[2]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  Richard M. Schwartz,et al.  Adaptation to new microphones using tied-mixture normalization , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jean-François Bonastre,et al.  AMIRAL: A Block-Segmental Multirecognizer Architecture for Automatic Speaker Recognition , 2000, Digit. Signal Process..

[6]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[7]  Michael Picheny,et al.  The metamorphic algorithm: a speaker mapping approach to data augmentation , 1994, IEEE Trans. Speech Audio Process..

[8]  Michael Picheny,et al.  Speaker clustering and transformation for speaker adaptation in speech recognition systems , 1998, IEEE Trans. Speech Audio Process..

[9]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[11]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..