Improving online incremental speaker adaptation with eigen feature space MLLR

This paper investigates an eigen feature space maximum likelihood linear regression (fMLLR) scheme to improve the performance of online speaker adaptation in automatic speech recognition systems. In this stochastic-approximation-like framework, the traditional incremental fMLLR estimation is considered as a slowly changing mean of the eigen fMLLR. It helps the adaptation when only a limited amount of data is available at the beginning of the conversation. The scheme is shown to be able to balance the transformation estimation given the data and yields reasonable improvements for online systems.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Lin-Shan Lee,et al.  Fast speaker adaptation using eigenspace-based maximum likelihood linear regression , 2000, INTERSPEECH.

[4]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[5]  EIGENSPACE-BASED LINEAR TRANSFORMATION APPROACH FOR RAPID SPEAKER ADAPTATION , 2001 .

[6]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hakan Erdogan,et al.  Incremental on-line feature space MLLR adaptation for telephony speech recognition , 2002, INTERSPEECH.

[8]  Sreeram V. Balakrishnan Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent , 2003, INTERSPEECH.

[9]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[10]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  John H. L. Hansen,et al.  Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  Ruhi Sarikaya,et al.  IBM Mastor: Multilingual Automatic Speech-To-Speech Translator , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[14]  Wei Zhang,et al.  Developing high performance asr in the IBM multilingual speech-to-speech translation system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.