Rescoring n-best lists for Russian speech recognition using factored language models

In this paper, we present a research of factored language model (FLM) for rescoring N-best lists for Russian speech recognition task. As a baseline language model we used a 3gram language model. Both baseline and factored language models were trained on a text corpus collected from recent news texts on Internet sites of online newspapers; total size of the corpus is about 350 million words (2.4 GB data). For FLMs creation, we used five factors: word, its lemma, stem, part-of-speech, and morphological tag. We investigate the influence of factor set on language model perplexity and word error rate (WER). Experiments on large vocabulary continuous Russian speech recognition showed that FLM can reduce WER.

[1]  Konstantin Markov,et al.  Evaluation of Advanced Language Modeling Techniques for Russian LVCSR , 2013, SPECOM.

[2]  Andrey Ronzhin,et al.  PARAD-R: Speech analysis software for meeting support , 2013, 2013 9th International Conference on Information, Communications & Signal Processing.

[3]  Andrey Ronzhin,et al.  Speaker Turn Detection Based on Multimodal Situation Analysis , 2013, SPECOM.

[4]  Andrey Ronzhin,et al.  Large vocabulary Russian speech recognition using syntactico-statistical language modeling , 2014, Speech Commun..

[5]  Solomon Teferra Abate,et al.  Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition , 2009, LTC.

[6]  Greville G. Corbett,et al.  The Slavonic Languages , 1993 .

[7]  Kevin Duh,et al.  Factored Language Models Tutorial , 2007 .

[8]  Andrey Ronzhin,et al.  Event-Driven Content Management System for Smart Meeting Room , 2011, NEW2AN.

[9]  A. Waibel,et al.  The 2011 KIT QUAERO Speech-to-Text System for Russian , 2015 .

[10]  T. Wade,et al.  The Russian Language Today , 1999 .

[11]  Andrey Ronzhin,et al.  Multimodal Interaction with Intelligent Meeting Room Facilities from Inside and Outside , 2009, NEW2AN.

[12]  Sebastian Stüker,et al.  Maximum entropy language modeling for Russian ASR , 2013, IWSLT.

[13]  Konstantin Markov,et al.  Factored language modeling for Russian LVCSR , 2013, 2013 International Joint Conference on Awareness Science and Technology & Ubi-Media Computing (iCAST 2013 & UMEDIA 2013).

[14]  Mikhail Zulkarneev,et al.  The Use of d-gram Language Models for Speech Recognition in Russian , 2013, SPECOM.

[15]  Jan Silovský,et al.  Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak) , 2009, COST 2102 Training School.

[16]  Ngoc Thang Vu,et al.  Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling , 2013, ACL.

[17]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[18]  Josef Psutka,et al.  Exploiting Linguistic Knowledge in Language Modeling of Czech Spontaneous Speech , 2006, LREC.

[19]  Andrey Ronzhin,et al.  From smart devices to smart space , 2010 .

[20]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[21]  Wolfgang Minker,et al.  Speech and Language Resources for LVCSR of Russian , 2012, LREC.

[22]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[23]  Alexander L. Ronzhin,et al.  A Video Monitoring Model with a Distributed Camera System for the Smart Space , 2010, NEW2AN.

[24]  Alexey Karpov,et al.  Modeling of Pronunciation, Language and Nonverbal Units at Conversational Russian Speech Recognition , 2013, Int. J. Comput. Sci. Appl..

[25]  Ebru Arisoy,et al.  Unlimited vocabulary speech recognition for agglutinative languages , 2006, NAACL.

[26]  Francoise Beaufays,et al.  Google Search by Voice: A Case Study , 2010 .

[27]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[28]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[29]  Alexey Karpov,et al.  Lexicon Size and Language Model Order Optimization for Russian LVCSR , 2013, SPECOM.

[30]  Andreas Stolcke,et al.  SRILM at Sixteen: Update and Outlook , 2011 .

[31]  Ruben Grigoryan,et al.  Acoustic Modeling with Deep Belief Networks for Russian Speech Recognition , 2013, SPECOM.

[32]  Tanel Alumäe Sentence-Adapted Factored Language Model for Transcribing Estonian Speech , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.