论文信息 - Spoken and Written Language Resources for Vietnamese

Spoken and Written Language Resources for Vietnamese

This paper presents an overview of our activities for spoken and written language resources for Vietnamese implemented at CLIPSIMAG Laboratory and International Research Center MICA. A new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. The first results of a process of building a large Vietnamese speech database (VNSpeechCorpus) and a phonetic dictionary, which is used for automatic alignment process, are also presented.

[1] Laurent Besacier,et al. Using the web for fast language model construction in minority languages , 2003, INTERSPEECH.

[2] Vincent Berment. Several Technical Issues for Building New Lexical Bases , 2002 .

[3] Dominique Vaufreydaz,et al. A New Methodology for Speech Corpora Definition from Internet Documents , 2000, LREC.

[4] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[5] E. Vajda. Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .