Parallel Corpora for bi-Directional Statistical Machine Translation for Seven Ethiopian Language Pairs

In this paper, we describe the development of parallel corpora for Ethiopian Languages: Amharic, Tigrigna, Afan-Oromo, Wolaytta and Ge’ez. To check the usability of all the corpora we conducted baseline bi-directional statistical machine translation (SMT) experiments for seven language pairs. The performance of the bi-directional SMT systems shows that all the corpora can be used for further investigations. We have also shown that the morphological complexity of the Ethio-Semitic languages has a negative impact on the performance of the SMT especially when they are target languages. Based on the results we obtained, we are currently working towards handling the morphological complexities to improve the performance of statistical machine translation among the Ethiopian languages. * This work is licenced under a Creative Commons Attribution 4.0 International License. Page numbers and proceedings footer are added by the organizers. License details: http://creativecommons.org/licenses/by/4.0/

[1]  Grover Hudson,et al.  Essentials of Amharic , 2007 .

[2]  Million Meshesha,et al.  Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrigna , 2017, ICT4DA.

[3]  Eleni Teshome,et al.  Bidirectional English-Amharic Machine Translation: An Experiment using Constrained Corpus , 2013 .

[4]  Motomichi Wakasa,et al.  A Descriptive Study of the Modern Wolaytta Language , 2008 .

[5]  Akubazgi Gebremariam,et al.  Amharic-to-Tigrigna Machine Translation Using Hybrid Approach , 2017 .

[6]  Sisay Adugna English – Afaan Oromoo Machine Translation: An Experiment Using Statistical Approach , 2009 .

[7]  Tariku Tsegaye,et al.  English -Tigrigna Factored Statistical Machine Translation , 2014 .

[8]  W. J. Hutchins,et al.  Machine Translation: A Brief History , 1995 .

[9]  Laurent Besacier,et al.  English-Amharic Statistical Machine Translation , 2012 .

[10]  Michael Gasser,et al.  A Dependency Grammar for Amharic , 2010 .

[11]  Wolf Leslau,et al.  Introductory grammar of Amharic , 2002 .

[12]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Sisay Fissaha Adafre Adding Amharic to a Unification-Based Machine Translation System: An Experiment , 2004 .

[15]  Lyle Campbell,et al.  Ethnologue: Languages of the world (review) , 2008 .

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Catherine Griefenow-Mewis,et al.  A grammatical sketch of written Oromo , 2001 .

[18]  M. Gasser HornMorpho: a system for morphological processing of Amharic, Oromo, and Tigrinya , 2011 .