论文信息 - First steps in building a large vocabulary continuous speech recognition system for Vietnamese

First steps in building a large vocabulary continuous speech recognition system for Vietnamese

This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first results of a process of building a large speech corpus for Vietnamese (VNSpeechCorpus) and a phonetic dictionary, which is used for automatic alignment process, are also presented. Finally, a language model and an acoustic model are constructed to obtain a LVCSR system for Vietnamese. Index Terms Automatic Speech Recognition, LVCSR, text corpus, speech corpus, pronunciation dictionary, language modeling, acoustic modeling.

Jean-François Serignat | Laurent Besacier | Eric Castelli

[1] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[2] E. Vajda. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[3] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4] Laurent Besacier,et al. Using the web for fast language model construction in minority languages , 2003, INTERSPEECH.

[5] William J. Byrne,et al. Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6] Klaus Ries,et al. The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Dominique Vaufreydaz,et al. A New Methodology for Speech Corpora Definition from Internet Documents , 2000, LREC.

[8] Vincent Berment. Several Technical Issues for Building New Lexical Bases , 2002 .