Domain-specific MT for Low-resource Languages: The case of Bambara-French

Translating to and from low-resource languages is a challenge for machine translation (MT) systems due to a lack of parallel data. In this paper we address the issue of domainspecific MT for Bambara, an under-resourced Mande language spoken in Mali. We present the first domain-specific parallel dataset for MT of Bambara into and from French. We discuss challenges in working with small quantities of domain-specific data for a low-resource language and we present the results of machine learning experiments on this data.

[1]  Francis M. Tyers,et al.  Towards a dependency-annotated treebank for Bambara , 2018, TLT.

[2]  Arnu Pretorius,et al.  On Optimal Transformer Depth for Low-Resource Language Translation , 2020, ArXiv.

[3]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[4]  Michael Leventhal,et al.  Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara , 2020, LORESMT.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Christopher Culy,et al.  The complexity of the vocabulary of Bambara , 1985 .

[7]  Marcos Zampieri,et al.  Assessing Human Translations from French to Bambara for Machine Learning: a Pilot Study , 2020, ArXiv.

[8]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[9]  Marta R. Costa-jussà,et al.  Findings of the 2019 Conference on Machine Translation (WMT19) , 2019, WMT.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Maja Popovic,et al.  chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Rico Sennrich,et al.  MT-based Sentence Alignment for OCR-generated Parallel Texts , 2010, AMTA.

[14]  Elena Voita,et al.  BPE-Dropout: Simple and Effective Subword Regularization , 2020, ACL.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Stefan Riezler,et al.  Joey NMT: A Minimalist NMT Toolkit for Novices , 2019, EMNLP.