Morphological constraints for phrase pivot statistical machine translation

The lack of parallel data for many language pairs is an important challenge to statistical machine translation (SMT). One common solution is to pivot through a third language for which there exist parallel corpora with the source and target languages. Although pivoting is a robust technique, it introduces some low quality translations especially when a poor morphology language is used as the pivot between rich morphology languages. In this paper, we examine the use of synchronous morphology constraint features to improve the quality of phrase pivot SMT. We compare hand-crafted constraints to those learned from limited parallel data between source and target languages. The learned morphology constraints are based on projected align- ments between the source and target phrases in the pivot phrase table. We show positive results on Hebrew-Arabic SMT (pivoting on English). We get 1.5 BLEU points over a phrase pivot baseline and 0.8 BLEU points over a system combination baseline with a direct model built from parallel data.

[1]  Tonio Wandmacher,et al.  Automatic Acquisition of the , 2009, EMNLP.

[2]  Nizar Habash,et al.  Machine Translation between Hebrew and Arabic: Needs, Challenges and Preliminary Solutions , 2010, AMTA.

[3]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[4]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[5]  Jan Hajic,et al.  Machine Translation of Very Close Languages , 2000, ANLP.

[6]  Nizar Habash,et al.  Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Francisco Casacuberta,et al.  Minimum Error-Rate Training in Statistical Machine Translation Using Structural SVMs , 2009, IbPRIA.

[9]  D. Shinar BEN-GURION UNIVERSITY OF THE NEGEV , 2012 .

[10]  Nizar Habash,et al.  Alignment symmetrisation optimization targeting phrase pivot statistical machine translation , 2014, EAMT.

[11]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[12]  Marcello Federico,et al.  Phrase-based statistical machine translation with pivot languages. , 2008, IWSLT.

[13]  Andreas Kathol,et al.  Strategies for building a Farsi-English SMT system from limited resources , 2008, INTERSPEECH.

[14]  Nizar Habash,et al.  Machine translation between Hebrew and Arabic , 2011, Machine Translation.

[15]  Jun Hu,et al.  Improving Arabic-Chinese Statistical Machine Translation using English as Pivot Language , 2009, WMT@EACL.

[16]  Kemal Oflazer,et al.  Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish , 2010, ACL.

[17]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18]  Nizar Habash,et al.  Techniques for Arabic morphological detokenization and orthographic denormalization , 2010 .

[19]  Lori Levin,et al.  A trainable transfer-based MT approach for languages with limited resources , 2004, EAMT.

[20]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[21]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[22]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[23]  Nizar Habash,et al.  Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation , 2013, ACL 2013.

[24]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[25]  Yulia Tsvetkov,et al.  Automatic Acquisition of Parallel Corpora from Websites with Dynamic Content , 2010, LREC.

[26]  José B. Mariño,et al.  The TALP&I2r SMT systems for IWSLT 2008 , 2008, IWSLT.

[27]  Hitoshi Isahara,et al.  A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation , 2007, NAACL.

[28]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[29]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[30]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[31]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[32]  Louis B. Rall Automatic error analysis , 1981 .

[33]  Tomoki Toda,et al.  Improving Pivot Translation by Remembering the Pivot , 2015, Annual Meeting of the Association for Computational Linguistics.

[34]  Hua Wu,et al.  Revisiting Pivot Language Approach for Machine Translation , 2009, ACL.

[35]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[36]  Philipp Koehn,et al.  462 Machine Translation Systems for Europe , 2009, MTSUMMIT.

[37]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[38]  Nizar Habash,et al.  Hebrew Morphological Preprocessing for Statistical Machine Translation , 2012, EAMT.

[39]  Nizar Habash,et al.  Alignment symmetrization optimization targeting phrase pivot statistical machine translation , 2014 .

[40]  Nizar Habash,et al.  Automatic Error Analysis for Morphologically Rich Languages , 2011 .

[41]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[42]  Nizar Habash,et al.  Syntactic Reordering for English-Arabic Phrase-Based Machine Translation , 2009, SEMITIC@EACL.