Prior information for rapid speaker adaptation

Rapidly adapting a speech recognition system to new speakers using a small amount of adaptation data is important to improve initial user experience. In this paper, a count-smoothing framework for incorporating prior information is extended to allow for the use of different forms of dynamic prior and improve the robustness of transform estimation on small amounts of data. Prior information is obtained from existing rapid adaptation techniques like VTLN and PCMLLR. Results using VTLN as a dynamic prior for CMLLR estimation show that transforms estimated on just one utterance can yield relative gains of 15% and 46% over a baseline gender independent model on two tasks. Index Terms: automatic speech recognition, speaker adaptation, VTLN, prior knowledge

[1]  Mark J. F. Gales,et al.  Incremental predictive and adaptive noise compensation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[3]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[4]  Mark J. F. Gales,et al.  Using VTLN for broadcast news transcription , 2004, INTERSPEECH.

[5]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[6]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[7]  Takao Kobayashi,et al.  Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis , 2006, INTERSPEECH.

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Mark J. F. Gales,et al.  Predictive linear transforms for noise robust speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[10]  Srinivasan Umesh,et al.  A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics , 2008, INTERSPEECH.

[11]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[12]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[13]  Mark J. F. Gales,et al.  Improving joint uncertainty decoding performance by predictive methods for noise robust speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.