论文信息 - Preliminary experiments on English-Amharic statistical machine translation

Preliminary experiments on English-Amharic statistical machine translation

This paper discusses the preliminary experiment conducted to translate from English to Amharic using the Statistical Machine Translation (EASMT) approach. The experiment on the EASMT system is being conducted on training corpus of both languages based on expressions that are found in parallel documents. The experiment involves collecting of a total of 632 Parliamentary corpora of which 115 have been used in the experiment. The corpus coverage is 15 years from Aug 21, 1995 to July 16, 2010. Each document contains data, which are translations of each other. The experiment has been conducted using 18,432 English-Amharic sentence pairs extracted from these corpora in order to measure the accuracy of the translation system. Accordingly, the baseline phrase-based BLEU score result is 35.32%. A 0.34% increase in BLEU has been achieved by applying morpheme segmentation to the tokens of the Amharic output result and the reference of the baseline system. The increase is 0.92% when compared with the same segmented reference between the baseline and the segmented system.

Laurent Besacier | Mulu Gebreegziabher Teshome | L. Besacier | Mulualem Teshome

[1] Philipp Koehn,et al. Moses - statistical machine translation system , 2006 .

[2] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[3] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[4] Daniel Yacob. TALN 2005 Developments Towards an Electronic Amharic Corpus , 2005 .

[5] Virginia Teller. Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[6] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7] Mathias Creutz,et al. Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[8] James H. Martin,et al. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[9] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.