Automatic Speech Recognition for African Languages with Vowel Length Contrast

Abstract This paper deals with ASR for two languages: Hausa and Wolof. Their common characteristic is to appear with vowel length contrast. In other words, two versions (short/ long) of a same vowel exist in the phoneme inventory of the language. We expect that taking into account this contrast in ASR models might help and this is what we investigate in this pilot study. The experimental results show that while both approaches (vowel length contrast modeling or not) lead to similar results, their combination allows to slightly improve ASR performance. As a by-product of ASR system design, we also show that the acoustic models obtained allow a large scale analysis of vowel length contrast for phonetic studies.

[1]  Using automatic speech recognition for phonological purposes: , 2009 .

[2]  Ngoc Thang Vu,et al.  Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training , 2011, INTERSPEECH.

[3]  Roger K. Moore,et al.  Finding allophones: an evaluation on consonants in the TIMIT corpus , 2009, INTERSPEECH.

[4]  Laurent Besacier,et al.  Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof , 2016, LREC.

[5]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[7]  Ngoc Thang Vu,et al.  Hausa large vocabulary continuous speech recognition , 2012, SLTU.

[8]  Thomas Pellegrini Transcription automatique de langues peu dotées , 2008 .

[9]  ROXANA MA NEWMAN,et al.  An Acoustic and Phonological Study of Pre-Pausal Vowel Length in Hausa , 1981 .

[10]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[11]  Etienne Barnard,et al.  Wolof Speech Recognition Model of Digits and Limited-Vocabulary Based on HMM and ToolKit , 2012, 2012 UKSim 14th International Conference on Computer Modelling and Simulation.

[12]  Etienne Barnard,et al.  Speech Technology for Information Access: a South African Case Study , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[13]  P. Newman The Hausa Language. An Encyclopedic Reference Grammar , 2002 .

[14]  Venkata Ramana Rao,et al.  MODELING WORD DURATION FOR BETTER SPEECH RECOGNITION , 2008 .

[15]  Juan-Manuel Torres-Moreno,et al.  Boîte à outils TAL pour des langues peu informatisées : le cas du somali , 2006 .

[16]  Mikko Kurimo,et al.  Duration modeling techniques for continuous speech recognition , 2004, INTERSPEECH.

[17]  Paul Newman,et al.  Hausa Language , 2000 .

[18]  Daniel Povey Phone duration modeling for LVCSR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Rena Nemoto,et al.  Phone duration modeling using clustering of rich contexts , 2013, INTERSPEECH.

[20]  Sylvie Voisin Relations entre fonctions syntaxiques et fonctions sémantiques en wolof , 2002 .