Speaker and noise factorisation on the AURORA4 task

For many realistic scenarios, there are multiple factors that affect the clean speech signal. In this work approaches to handling two such factors, speaker and background noise differences, simultaneously are described. A new adaptation scheme is proposed. Here the acoustic models are first adapted to the target speaker via an MLLR transform. This is followed by adaptation to the target noise environment via model-based vector Taylor series (VTS) compensation. These speaker and noise transforms are jointly estimated, using maximum likelihood. Experiments on the AURORA4 task demonstrate that this adaptation scheme provides improved performance over VTS-based noise adaptation. In addition, this framework enables the speech and noise to be factorised, allowing the speaker transform estimated in one noise condition to be successfully used in a different noise condition.

[1]  Mark J. F. Gales Acoustic factorisation , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[3]  Michael Picheny,et al.  Robust speech recognition in noise --- performance of the IBM continuous speech recogniser on the ARPA noise spoke task , 1995 .

[4]  Jean-Claude Junqua,et al.  Separating speaker and environment variabilities for improved recognition in non-stationary conditions , 2001, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Adaptation of precision matrix models on large vocabulary continuous speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[7]  Hugo Van hamme,et al.  Feature versus model based noise robustness , 2010, INTERSPEECH.

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Mark J. F. Gales,et al.  Adaptive training using structured transforms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[11]  Hank Liao,et al.  Joint uncertainty decoding for robust large vocabulary speech recognition , 2006 .

[12]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[13]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..