LESSONS LEARNED AFTER DEVELOPMENT AND USE OF A DATA COLLECTION APP FOR LANGUAGE DOCUMENTATION (LIG-AIKUMA)

Lig-Aikuma is a free Android app running on various mobile phones and tablets. It proposes a range of different speech collection modes (recording, respeaking, translation and elicitation) and offers the possibility to share recordings between users. More than 250 hours of speech in 6 different languages from sub-Saharan Africa (including 3 oral languages in the process of being documented) have already been collected with Lig-Aikuma. This paper presents the lessons learned after 3 years of development and use of Lig-Aikuma. While significant data collections were conducted, this has not been done without difficulties. Some mixed results lead us to stress the importance of design choices, data sharing architecture and user manual. We also discuss other potential uses of the app, discovered during its deployment: data collection for language revitalisation, data collection for speech technology development (ASR) and enrichment of existing corpora through the addition of spoken comments.

[1]  Odette Ambouroue,et al.  The grammar of Orungu proper names , 2011 .

[2]  Laurent Besacier,et al.  First automatic fongbe continuous speech recognition system: Development of acoustic models and language models , 2016, 2016 Federated Conference on Computer Science and Information Systems (FedCSIS).

[3]  Laurent Besacier,et al.  Amharic-English Speech Translation in Tourism Domain , 2017, SCNLP@EMNLP 2017.

[4]  Georges Martial Embanga Aborobongui Processus segmentaux et tonals en Mbondzi - (variété de la langue embosi C25) - , 2013 .

[5]  Steven Bird,et al.  Aikuma: A Mobile App for Collaborative Language Documentation , 2014 .

[6]  Lori Lamel,et al.  Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville) , 2018, LREC.

[7]  P. Lewis Ethnologue : languages of the world , 2009 .

[8]  Pierre Lemb,et al.  Dictionnaire basaá-français , 1973 .

[9]  Emmanuel-Moselly Makasso,et al.  Intonation et mélismes dans le discours oral spontané en basaa. Thèse dirigée par Geneviève Caelen-Haumont, soutenue le 4 novembre 2008 , 2008 .

[10]  Laurent Besacier,et al.  Machine Assisted Analysis of Vowel Length Contrasts in Wolof , 2017, INTERSPEECH.

[11]  Sebastian Stüker,et al.  A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[12]  Sebastian Stüker,et al.  BULBasaa: A Bilingual Basaa-French Speech Corpus for the Evaluation of Language Documentation Tools , 2018, LREC.

[13]  Odette Ambouroue Eléments de description de l'orungu Langue bantu du Gabon (B11b) , 2007 .

[14]  Fatima Hamlaoui,et al.  Focus marking and the unavailability of inversion structures in the Bantu language Bàsàá (A43) , 2015 .