Modeling Persian Verb Morphology to Improve English-Persian Machine Translation

Morphological analysis is an essential process in translating from a morphologically poor language such as English into a morphologically rich language such as Persian. In this paper, first we analyze the output of a rule-based machine translation (RBMT) and categorize its errors. After that, we use a statistical approach to rich morphology prediction using a parallel corpus to improve the quality of RBMT. The results of error analysis show that Persian morphology comes with many challenges especially in the verb conjugation. In our approach, we define a set of linguistic features using both English and Persian linguistic information obtained from an English-Persian parallel corpus, and make our model. In our experiments, we generate inflected verb form with the most common feature values as a baseline. The results of our experiments show an improvement of almost 2.6% absolute BLEU score on a test set containing 16 K sentences.

[1]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Nizar Habash,et al.  Rich Morphology Generation Using Statistical Machine Translation , 2012, INLG.

[4]  Kristina Toutanova,et al.  Applying Morphology Generation Models to Machine Translation , 2008, ACL.

[5]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[6]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[7]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[8]  Anoop Sarkar,et al.  Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction , 2011, ACL.

[9]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[10]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[11]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[12]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[13]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[14]  Michael Subotin,et al.  An exponential translation model for target language morphology , 2011, ACL.

[15]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[16]  Karine Megerdoomian,et al.  Finite-State Morphological Analysis of Persian , 2004 .

[17]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[18]  Preslav Nakov,et al.  A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages , 2010, EMNLP.

[19]  José B. Mariño,et al.  On the impact of morphology in English to Spanish statistical MT , 2008, Speech Commun..

[20]  A. Mansouri,et al.  State-of-the-art English to Persian Statistical Machine Translation system , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[21]  Kristina Toutanova,et al.  Generating Complex Morphology for Machine Translation , 2007, ACL.

[22]  Philipp Koehn,et al.  Enriching Morphologically Poor Languages for Statistical Machine Translation , 2008, ACL.

[23]  Heshaam Faili,et al.  Unsupervised Identification of Persian Compound Verbs , 2011, MICAI.

[24]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[25]  Nizar Habash,et al.  Translate, Predict or Generate: Modeling Rich Morphology in Statistical Machine Translation , 2012, EAMT.

[26]  Kemal Oflazer Statistical Machine Translation into a Morphologically Complex Language , 2008, CICLing.