Collecting and Evaluating Speech Recognition Corpora for Nine Southern Bantu Languages

We describe the Lwazi corpus for automatic speech recognition (ASR), a new telephone speech corpus which includes data from nine Southern Bantu languages. Because of practical constraints, the amount of speech per language is relatively small compared to major corpora in world languages, and we report on our investigation of the stability of the ASR models derived from the corpus. We also report on phoneme distance measures across languages, and describe initial phone recognisers that were developed using this data.

[1]  Jean-François Bonastre,et al.  Automatic transcription of Somali language , 2006, INTERSPEECH.

[2]  Etienne Barnard,et al.  HIV health information access using spoken dialogue systems: Touchtone vs. speech , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[3]  Etienne Barnard,et al.  Language and Technology Literacy Barriers to Accessing Government Services , 2003, EGOV.

[4]  Dilek Z. Hakkani-Tür,et al.  Active and unsupervised learning for automatic speech recognition , 2003, INTERSPEECH.

[5]  Kazuhiro Kondo,et al.  An evaluation of cross-language adaptation for rapid HMM development in a new language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  E. Barnard,et al.  Phonetics of intonation in South African Bantu languages , 2008 .

[7]  Björn Gambäck,et al.  A speaker independent continuous speech recognizer for Amharic , 2005, INTERSPEECH.

[8]  Javier Garrido Salas,et al.  STC-TIMIT: Generation of a Single-channel Telephone Corpus , 2008, LREC.

[9]  Etienne Barnard,et al.  Pronunciation prediction with Default&Refine , 2008, Comput. Speech Lang..

[10]  Johan A. du Preez,et al.  Developing a Multilingual Telephone Based Information System in African Languages , 2000, LREC.

[11]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[12]  Lou Boves,et al.  In search of optimal data selection for training of automatic speech recognition systems , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[13]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[14]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  Ronald Rosenfeld,et al.  HealthLine: Speech-based access to health information by low-literate users , 2007, 2007 International Conference on Information and Communication Technologies and Development.

[16]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[17]  Marelie Hattingh Davel,et al.  Data requirements for speaker independent acoustic models , 2008 .

[18]  Thomas Niesler,et al.  Language-dependent state clustering for multilingual acoustic modelling , 2007, Speech Commun..

[19]  Rong Zhang,et al.  Data selection for speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).