Spoken and Written Language Resources for Vietnamese

This paper presents an overview of our activities for spoken and written language resources for Vietnamese implemented at CLIPSIMAG Laboratory and International Research Center MICA. A new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. The first results of a process of building a large Vietnamese speech database (VNSpeechCorpus) and a phonetic dictionary, which is used for automatic alignment process, are also presented.