Language Models with RNNs for Rescoring Hypotheses of Russian ASR

In this paper, we describe a research of recurrent neural networks (RNNs) for language modeling in large vocabulary continuous speech recognition for Russian. We experimented with recurrent neural networks with different number of units in the hidden layer. RNN-based and 3-gram language models (LMs) were trained using the text corpus of 350M words. Obtained RNN-based language models were used for N-best list rescoring for automatic continuous Russian speech recognition. We tested also a linear interpolation of RNN LMs with the baseline 3-gram LM and achieved 14 % relative reduction of the word error rate (WER) with respect to the baseline 3-gram model.

[1]  Alexey Karpov,et al.  A Comparison of RNN LM and FLM for Russian Speech Recognition , 2015, SPECOM.

[2]  Tetsunori Kobayashi,et al.  Multiscale recurrent neural network based language model , 2015, INTERSPEECH.

[3]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[4]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[5]  Hermann Ney,et al.  Comparison of feedforward and recurrent neural network language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Konstantin Markov,et al.  Evaluation of Advanced Language Modeling Techniques for Russian LVCSR , 2013, SPECOM.

[7]  Yangyang Shi,et al.  Exploiting the succeeding words in recurrent neural network language models , 2013, INTERSPEECH.

[8]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[9]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[10]  Mark J. F. Gales,et al.  Paraphrastic recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Alexey Karpov,et al.  Lexicon Size and Language Model Order Optimization for Russian LVCSR , 2013, SPECOM.

[12]  Andrey Ronzhin,et al.  Large vocabulary Russian speech recognition using syntactico-statistical language modeling , 2014, Speech Commun..

[13]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[14]  Mark J. F. Gales,et al.  Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yuuki Tachioka,et al.  Discriminative method for recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  Alexey Karpov,et al.  Recurrent neural network-based language modeling for an automatic Russian speech recognition system , 2015, 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT).

[18]  Geoffrey Zweig,et al.  Cache based recurrent neural network language model inference for first pass speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[21]  Ngoc Thang Vu,et al.  Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech , 2014, CodeSwitch@EMNLP.