Recurrent Neural Network Based Language Modeling in Meeting Recognition

We use recurrent neural network (RNN) based language models to improve the BUT English meeting recognizer. On the baseline setup using the original language models we decrease word error rate (WER) more than 1% absolute by n-best list rescoring and language model adaptation. When n-gram language models are trained on the same moderately sized data set as the RNN models, improvements are higher yielding a system which performs comparable to the baseline. A noticeable improvement was observed with unsupervised adaptation of RNN models. Furthermore, we examine the influence of word history on WER and show how to speed-up rescoring by caching common prefix strings. Index Terms: automatic speech recognition, language modeling, recurrent neural networks, rescoring, adaptation

[1]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[2]  Jürgen Schmidhuber,et al.  LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[3]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4]  Mikael Bodén,et al.  A guide to recurrent neural networks and backpropagation , 2001 .

[5]  Jean-Luc Gauvain,et al.  Building continuous space language models for transcribing european languages , 2005, INTERSPEECH.

[6]  Sadaoki Furui,et al.  International Speech Communication Association , 2006 .

[7]  David A. van Leeuwen,et al.  The 2007 AMI(DA) System for Meeting Transcription , 2007, CLEAR.

[8]  Ruhi Sarikaya,et al.  Gaussian Mixture Language Models for Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Lukás Burget,et al.  Investigation into bottle-neck features for meeting speech recognition , 2009, INTERSPEECH.

[10]  Lukás Burget,et al.  The AMIDA 2009 meeting transcription system , 2010, INTERSPEECH.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Sanjeev Khudanpur,et al.  Variational approximation of long-span language models for lvcsr , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).