Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show significant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).

[1]  Timothy Baldwin,et al.  Open Source Corpus Analysis Tools for Malay , 2006, LREC.

[2]  Hwee Tou Ng,et al.  Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages , 2009, EMNLP.

[3]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[4]  Preslav Nakov,et al.  Improved Statistical Machine Translation Using Monolingual Paraphrases , 2008, ECAI.

[5]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[6]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[7]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[8]  Young-Suk Lee,et al.  Morphological Analysis for Statistical Machine Translation , 2004, NAACL.

[9]  Hwee Tou Ng,et al.  TESLA: Translation Evaluation of Sentences with Linear-Programming-Based Analysis , 2010, WMT@ACL.

[10]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[12]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[13]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[14]  Miles Osborne,et al.  Modelling Lexical Redundancy for Machine Translation , 2006, ACL.

[15]  Nizar Habash,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Arabic Preprocessing Schemes for Statistical Machine Translation , 2006 .

[16]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[17]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[18]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[19]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[22]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[23]  Hugh E. Williams,et al.  Stemming Indonesian: A confix-stripping approach , 2007, TALIP.

[24]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[25]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[26]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[27]  Christopher J. Dyer,et al.  The “Noisier Channel”: Translation from Morphologically Complex Languages , 2007, WMT@ACL.

[28]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[29]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[30]  Chris Dyer,et al.  Using a maximum entropy model to build segmentation lattices for MT , 2009, NAACL.

[31]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[32]  Mei Yang,et al.  Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages , 2006, EACL.