论文信息 - Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof

Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof

This article presents the data collected and ASR systems developped for 4 sub-saharan african languages (Swahili, Hausa, Amharic and Wolof). To illustrate our methodology, the focus is made on Wolof (a very under-resourced language) for which we designed the first ASR system ever built in this language. All data and scripts are available online on our github repository.

[1] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[2] Solomon Teferra Abate,et al. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic , 2014, Speech Commun..

[3] Martine Adda-Decker,et al. Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App , 2016, SLTU.

[4] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[5] Solange Rossato,et al. Using automatic speech recognition for phonological purposes: Study of Vowel Lenght in Punu (Bantu B40) , 2010 .

[6] Ngoc Thang Vu,et al. Hausa large vocabulary continuous speech recognition , 2012, SLTU.

[7] Using automatic speech recognition for phonological purposes: , 2009 .

[8] Philip Koslow. Hausaland : the fortress kingdoms , 1995 .

[9] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Jean Léopold Diouf. Dictionnaire wolof-français et français-wolof , 2003 .

[12] Steven Bird,et al. Aikuma: A Mobile App for Collaborative Language Documentation , 2014 .

[13] Arame Fal,et al. Dictionnaire wolof-français ; suivi d'un index français-wolof , 1990 .

[14] Anthony Rousseau,et al. XenC: An Open-Source Tool for Data Selection in Natural Language Processing , 2013, Prague Bull. Math. Linguistics.

[15] Sylvie Nouguier-Voisin. Relations entre fonctions syntaxiques et fonctions sémantiques en wolof , 2002 .

[16] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[17] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18] Bernard Comrie,et al. The World's Major Languages , 1987 .

[19] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..