African speech technology (AST) telephone speech databases: corpus design and contents
暂无分享,去创建一个
The African Speech Technology project is developing telephone speech databases for five of South Africa’s eleven official languages, i.e. South African English, Afrikaans, and three African languages, Zulu, Xhosa, and Southern Sotho. These databases will be fully transcribed – orthographically and phonetically – and will be used for the training and testing of phoneme-based, speaker-independent speech recognition systems. This paper describes the design and contents of the speech corpus that is currently being collected over both mobile and fixed networks. In particular language coverage is discussed within the framework of the multilingual character of the South African population. Some language-specific differences with regards to the contents of the different databases are noted. Methods and tools applied in the acquisition of phonetic information are discussed.
[1] Johan A. du Preez,et al. Developing a Multilingual Telephone Based Information System in African Languages , 2000, LREC.
[2] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[3] H. S. Tropf,et al. ELRA, SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices , 1998 .