Fast Speaker Adaptation in Automatic Online Subtitling

This paper deals with speaker adaptation techniques well suited for the task of online subtitling. Two methods are briefly discussed, namely MAP adaptation and fMLLR. The main emphasis is laid on the description of improvements involved in the process of adaptation subject to the time requirements. Since the adaptation data are gathered continuously, simple modifications of the accumulated statistics have to be carried out in order to make the adaptation more accurate. Another proposed improvement efficiently employs the combination of fMLLR and MAP. In the case of online adaptation no prior transcriptions of the data are available. They are handled by a recognition system, thus it is suitable to assign a well-applied confidence measure to each of the transcriptions. We have performed experiments focused on the trade-off between the adaptation speed and the amount of adaptation data. We were able to gain a relative reduction of WER 16.2 %.

[1]  Aleš Pražák,et al.  Methods of unsupervised adaptation in online speech recognition , 2009 .

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Hakan Erdogan,et al.  Incremental on-line feature space MLLR adaptation for telephony speech recognition , 2002, INTERSPEECH.

[6]  Ludek Müller,et al.  Live TV Subtitling - Fast 2-pass LVCSR System for Online Subtitling , 2007, SIGMAP.

[7]  Ludek Müller,et al.  Refinement Approach for Adaptation Based on Combination of MAP and fMLLR , 2009, TSD.

[8]  M. J. Evans WHP 065 Speech Recognition in Assisted and Live Subtitling for Television , 2003 .

[9]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[11]  Mark J. F. Gales,et al.  The generation and use of regression class trees for MLLR adaptation , 1996 .

[12]  George Saon,et al.  Feature and model space speaker adaptation with full covariance Gaussians , 2006, INTERSPEECH.