Class-based LSTM Russian Language Model with Linguistic Information

In the paper, we present class-based LSTM Russian language models (LMs) with classes generated with the use of both word frequency and linguistic information data, obtained with the help of the “VisualSynan” software from the AOT project. We have created LSTM LMs with various numbers of classes and compared them with word-based LM and class-based LM with word2vec class generation in terms of perplexity, training time, and WER. In addition, we performed a linear interpolation of LSTM language models with the baseline 3-gram language model. The LSTM language models were used for very large vocabulary continuous Russian speech recognition at an N-best list rescoring stage. We achieved significant progress in training time reduction with only slight degradation in recognition accuracy comparing to the word-based LM. In addition, our LM with classes generated using linguistic information outperformed LM with classes generated using word2vec. We achieved WER of 14.94 % at our own speech corpus of continuous Russian speech that is 15 % relative reduction with respect to the baseline 3-gram model.

[1]  Mikko Kurimo,et al.  Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Andrey Ronzhin,et al.  Large vocabulary Russian speech recognition using syntactico-statistical language modeling , 2014, Speech Commun..

[3]  Shankar Kumar,et al.  Lattice rescoring strategies for long short term memory language models in speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[4]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[5]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[6]  A. A. Karpov,et al.  Information enquiry kiosk with multimodal user interface , 2009, Pattern Recognition and Image Analysis.

[7]  Xiangmin Zhang,et al.  Rule-based word clustering for document metadata extraction , 2005, SAC '05.

[8]  Alexey Karpov,et al.  Lexicon Size and Language Model Order Optimization for Russian LVCSR , 2013, SPECOM.

[9]  Mikko Kurimo,et al.  TheanoLM - An Extensible Toolkit for Neural Network Language Modeling , 2016, INTERSPEECH.

[10]  Ivan Medennikov,et al.  LSTM-Based Language Models for Spontaneous Speech Recognition , 2016, SPECOM.

[11]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Konstantin Markov,et al.  Evaluation of Advanced Language Modeling Techniques for Russian LVCSR , 2013, SPECOM.

[13]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[14]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[15]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[16]  Irina S. Kipyatkova Improving Russian LVCSR Using Deep Neural Networks for Acoustic and Language Modeling , 2018, SPECOM.

[17]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[18]  Andrey Ronzhin,et al.  Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis , 2011, INTERSPEECH.

[19]  Irina S. Kipyatkova LSTM-Based Language Models for Very Large Vocabulary Continuous Russian Speech Recognition System , 2019, SPECOM.

[20]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[21]  Lyan Verwimp,et al.  TF-LM: TensorFlow-based Language Modeling Toolkit , 2018, LREC.

[22]  Irina S. Kipyatkova Experimenting with Hybrid TDNN/HMM Acoustic Models for Russian Speech Recognition , 2017, SPECOM.

[23]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[24]  Yunxin Zhao,et al.  Exploiting different word clusterings for class-based RNN language modeling in speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Alexander L. Ronzhin,et al.  HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech , 2016, SPECOM.