Amharic-English Speech Translation in Tourism Domain

This paper describes speech translation from Amharic-to-English, particularly Automatic Speech Recognition (ASR) with post-editing feature and Amharic-English Statistical Machine Translation (SMT). ASR experiment is conducted using morpheme language model (LM) and phoneme acoustic model(AM). Likewise,SMT conducted using word and morpheme as unit. Morpheme based translation shows a 6.29 BLEU score at a 76.4% of recognition accuracy while word based translation shows a 12.83 BLEU score using 77.4% word recognition accuracy. Further, after post-edit on Amharic ASR using corpus based n-gram, the word recognition accuracy increased by 1.42%. Since post-edit approach reduces error propagation, the word based translation accuracy improved by 0.25 (1.95%) BLEU score. We are now working towards further improving propagated errors through different algorithms at each unit of speech translation cascading component.

[1]  Solomon Teferra Abate,et al.  An Amharic speech corpus for large vocabulary continuous speech recognition , 2005, INTERSPEECH.

[2]  R. A. S. PAGET A World Language , 1943, Nature.

[3]  Sarah L. Nesbeitt Ethnologue: Languages of the World , 1999 .

[4]  Mikko Kurimo,et al.  Morfessor 2.0: Toolkit for statistical morphological segmentation , 2014, EACL.

[5]  P. Lewis Ethnologue : languages of the world , 2009 .

[6]  Martine Adda-Decker,et al.  Parallel Speech Collection for Under-resourced Language Studies Using the Lig-Aikuma Mobile Device App , 2016, SLTU.

[7]  Laurent Besacier,et al.  Collecting Resources in Sub-Saharan African Languages for Automatic Speech Recognition: a Case Study of Wolof , 2016, LREC.

[8]  Bowen Zhou,et al.  IBM MASTOR SYSTEM: Multilingual Automatic Speech-to-Speech Translator , 2006 .

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Laurent Besacier,et al.  Amharic Speech Recognition for Speech Translation , 2016 .

[11]  Christian Boitet,et al.  ASR and Translation for Under-Resourced Languages , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Solomon Teferra Abate,et al.  Effect of language resources on automatic speech recognition for Amharic , 2015, AFRICON 2015.

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Masaaki Honda,et al.  Human Speech Production Mechanisms , 2003 .

[15]  Tomio Takara,et al.  Development of an Amharic Text-to-Speech System Using Cepstral Method , 2009 .

[16]  Jacques Klein,et al.  A generic weaver for supporting product lines , 2008, EA '08.

[17]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .