Recent Progress in Development of Language Model for Slovak Large Vocabulary Continuous Speech Recognition

Speech technologies have a potentiality to simplify the human-machine interaction as well as the communication between people. The use of speech technology applications has nowadays continuously growing trend. Each speech recognition system, which stands in the heart of every speech application, besides an algorithmic complexity, is strongly language dependent. Therefore, one of the challenging tasks by the development of the Slovak large vocabulary continuous speech recognition (LVCSR) system is a creation of an efficient language model (LM).

[1]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[2]  Ebru Arisoy,et al.  Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages , 2007, HLT-NAACL.

[3]  Georges Linarès,et al.  Combination of probabilistic and possibilistic language models , 2010, INTERSPEECH.

[4]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[5]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[6]  Darjaa Sakhia,et al.  MobilDat-SK - a Mobile Telephone Extension to the SpeechDat-E SK Telephone Speech Database in Slovak , 2006 .

[7]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Daniel Jurafsky,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2009, Prentice Hall series in artificial intelligence.

[9]  Jan Silovský,et al.  Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak) , 2009, COST 2102 Training School.

[10]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[11]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[12]  William J. Byrne,et al.  Morpheme Based Language Models for Speech Recognition of Czech , 2000, TSD.

[13]  Ronald Rosenfeld,et al.  Scalable backoff language models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Mathias Creutz,et al.  Web Augmentation of Language Models for Continuous Speech Recognition of SMS Text Messages , 2009, EACL.

[15]  Wen Wang,et al.  Techniques for effective vocabulary selection , 2003, INTERSPEECH.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[17]  Matús Pleva,et al.  Slovak Language Model from Internet Text Data , 2010, COST 2102 Training School.

[18]  Bo-June Paul Hsu,et al.  Language Modeling for limited-data domains , 2009 .

[19]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[20]  Constantin Volosencu,et al.  New Technologies - Trends, Innovations and Research , 2012 .

[21]  Milos Cernak,et al.  Effective Triphone Mapping for Acoustic Modeling in Speech Recognition , 2011, INTERSPEECH.

[22]  Jan Nouza,et al.  Multi-words in the Czech TV/radio News Transcription system , 2006 .

[23]  Wei Yuan,et al.  An empirical study on language model adaptation , 2006, TALIP.