Using LSTM Networks to Translate French to Senegalese Local Languages: Wolof as a Case Study

In this paper, we propose a neural machine translation system for Wolof, a low-resource Niger-Congo language. First we gathered a parallel corpus of 70000 aligned French-Wolof sentences. Then we developped a baseline LSTM based encoder-decoder architecture which was further extended to bidirectional LSTMs with attention mechanisms. Our models are trained on a limited amount of parallel French-Wolof data of approximately 35000 parallel sentences. Experimental results on French-Wolof translation tasks show that our approach produces promising translations in extremely low-resource conditions. The best model was able to achieve a good performance of 47% BLEU score.

[1]  Moussa Lo,et al.  Building Word Representations for Wolof Using Neural Networks , 2020, InterSol.

[2]  David P Gamble The Wolof of Senegambia: Western Africa Part XIV , 2017 .

[3]  Philippe Langlais,et al.  Yet Another Fast, Robust and Open Source Sentence Aligner. Time toReconsider Sentence Alignment? , 2013, MTSUMMIT.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Jiajun Zhang,et al.  Deep Learning for Natural Language Processing , 2019, Cognitive Computation Trends.

[8]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Xiaoyi Ma,et al.  Champollion: A Robust Parallel Text Sentence Aligner , 2006, LREC.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[15]  András Kornai,et al.  Parallel corpora for medium density languages , 2007 .