Task adaptation of acoustic and language models based on large quantities of data

We investigate use of large amounts, over 1500 hours, of untranscribed data recorded from a deployed conversational system to improve the acoustic and language models. The system that we considered allows users to perform transactions on their retirement accounts. Using all the untranscribed data we get over 19% relative improvement in word error rate over a baseline system. In contrast, a system built using 70 hours of transcribed data results in over 31% relative improvement.

[1]  Dilek Z. Hakkani-Tür,et al.  Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Jean-Luc Gauvain,et al.  Unsupervised acoustic model training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[5]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6]  Benoît Maison,et al.  Robust confidence annotation and rejection for continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Alex Acero,et al.  Adapting acoustic models to new domains and conditions using untranscribed data , 2003, INTERSPEECH.