Unsupervised Languagemodel Adaptation for Meeting Recognition

We present an application of unsupervised language model (ML) adaptation to meeting recognition, in a scenario where sequences of multiparty meetings on related topics are to be recognized, but no prior in-domain data for LM training is available. The recognizer LMs are adapted according to the recognition output on temporally preceding meetings, either in speaker-dependent or speaker-independent mode. Model adaptation is carried out by interpolating the n-gram probabilities of a large generic LM with those of a small LM estimated from adaptation data, and minimizing perplexity on the automatic transcripts of a separate meeting set, also previously recognized. The adapted LMs yield about 5.9% relative reduction in word error compared to the baseline. This improvement is about half of what can be achieved with supervised adaptation, i.e. using human-generated speech transcripts.

[1]  Hermann Ney,et al.  Improved clustering techniques for class-based statistical language modelling , 1993, EUROSPEECH.

[2]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[3]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[6]  Frederick Jelinek,et al.  Improved clustering techniques for class-based statistical language modeling , 1999 .

[7]  Giuseppe Riccardi,et al.  On-line learning of language models with word error probability distributions , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Thomas Niesler,et al.  Unsupervised language model adaptation for lecture speech transcription , 2002, INTERSPEECH.

[10]  Brian Roark,et al.  Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Jean-Luc Gauvain,et al.  Unsupervised language model adaptation for broadcast news , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Tatsuya Kawahara,et al.  UNSUPERVISED LANGUAGE MODEL ADAPTATION FOR LECTURE SPEECH RECOGNITION , 2003 .

[13]  Gökhan Tür,et al.  Unsupervised and active learning in automatic speech recognition for call classification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[15]  Andreas Stolcke,et al.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.

[16]  Andreas Stolcke,et al.  Language Modeling in the ICSI-SRI Spring 2005 Meeting Speech Recognition Evaluation System , 2005 .

[17]  Jonathan G. Fiscus,et al.  The Rich Transcription 2005 Spring Meeting Recognition Evaluation , 2005, MLMI.