ASR domain adaptation methods for low-resourced languages: Application to Romanian language

This study investigates the possibility of using statistical machine translation to create domain-specific language resources. We propose a methodology that aims to create a domain-specific automatic speech recognition system for a low-resourced language when in-domain text corpora are available only in a high-resourced language. We evaluate a new semi-supervised method and compare it with previously developed semi-supervised and unsupervised approaches. Moreover, in the effort of creating an out-of-domain language model for Romanian, we introduce and experiment an effective diacritics restoration algorithm.

[1]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Horia Cucu,et al.  Enhancing Automatic Speech Recognition for Romanian by Using Machine Translated and Web-based Text Corpora , 2011 .

[3]  Laurent Besacier,et al.  Unsupervised acoustic model adaptation for multi-origin non native ASR , 2010, INTERSPEECH.

[4]  Fabrice Lefèvre,et al.  Combination of stochastic understanding and machine translation systems for language portability of dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Horia Cucu,et al.  Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[6]  Sadaoki Furui,et al.  Development of a speech recognition system for Icelandic using machine translated text , 2008, SLTU.

[7]  Dan Tufis,et al.  DIAC+: a Professional Diacritics Recovering System , 2008, LREC.

[8]  C. Negrescu,et al.  AUTOMATIC DIACRITIC RESTORATION FOR A TTS-BASED E-MAIL READER APPLICATION , 2008 .

[9]  David Suendermann-Oeft,et al.  Localization of speech recognition in spoken dialog systems: how machine translation can make our lives easier , 2009, INTERSPEECH.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Taro Watanabe,et al.  Language Model Adaptation with Additional Text Generated by Machine Translation , 2002, COLING.

[12]  Lucian Vlad Lita,et al.  tRuEcasIng , 2003, ACL.

[13]  Dragos Burileanu,et al.  An advanced NLP framework for high-quality Text-to-Speech synthesis , 2011, 2011 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD).