Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation

In this article, an original view on how to improve phrase translation estimates is proposed. This proposal is grounded on two main ideas: first, that appropriate examples of a given phrase should participate more in building its translation distribution; second, that paraphrases can be used to better estimate this distribution. Initial experiments provide evidence of the potential of our approach and its implementation for effectively improving translation performance.

[1]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[2]  Stanley Kok,et al.  Hitting the Right Paraphrases in Good Time , 2010, NAACL.

[3]  Adam Lopez Tera-Scale Translation Models via Pattern Matching , 2008, COLING.

[4]  François Yvon,et al.  Contrastive Lexical Evaluation of Machine Translation , 2010, LREC.

[5]  Shachar Mirkin,et al.  Learning an Expert from Human Annotations in Statistical Machine Translation: the Case of Out-of-Vocabulary Words , 2010, EAMT.

[6]  Philippe Langlais,et al.  Explorations in using grammatical dependencies for contextual phrase translation disambiguation , 2008, EAMT.

[7]  Alexandre Allauzen,et al.  Assessing Phrase-Based Translation Models with Oracle Decoding , 2010, EMNLP.

[8]  Andy Way,et al.  Exploiting source similarity for SMT using context-informed features , 2007, TMI.

[9]  Olivia Buzek,et al.  Improving Translation via Targeted Paraphrasing , 2010, EMNLP.

[10]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[11]  Andy Way,et al.  Facilitating Translation Using Source Language Paraphrase Lattices , 2010, EMNLP.

[12]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[13]  Yanjun Ma,et al.  Using Supertags as Source Language Context in SMT , 2009, EAMT.

[14]  Nitin Madnani,et al.  Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization , 2008, AMTA.

[15]  Marine Carpuat,et al.  Context-dependent phrasal translation lexicons for statistical machine translation , 2007, MTSUMMIT.

[16]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[17]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[18]  Chris Callison-Burch,et al.  Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[19]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[20]  Regina Barzilay,et al.  Paraphrasing for Automatic Evaluation , 2006, NAACL.

[21]  Philipp Koehn,et al.  Word Lattices for Multi-Source Translation , 2009, EACL.

[22]  Masao Utiyama,et al.  Paraphrase Lattice for Statistical Machine Translation , 2010, ACL.

[23]  Lucia Specia,et al.  Source-Language Entailment Modeling for Translating Unknown Terms , 2009, ACL.

[24]  Dragos Stefan Munteanu,et al.  Improving Machine Translation Performance by Exploiting Non-Parallel Corpora , 2005, CL.

[25]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[26]  Alex Waibel,et al.  Adaptation of the translation model for statistical machine translation based on information retrieval , 2005, EAMT.

[27]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[28]  Aurélien Max Local Rephrasing Suggestions for Supporing the Work of Writers , 2008, GoTAL.

[29]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[30]  Noah A. Smith,et al.  Rich Source-Side Context for Statistical Machine Translation , 2008, WMT@ACL.

[31]  Rebecca Hwa,et al.  Localization of Difficult-to-Translate Phrases , 2007, WMT@ACL.

[32]  Marine Carpuat,et al.  One Translation Per Discourse , 2009, SEW@NAACL-HLT.

[33]  Eric Nichols,et al.  Improving statistical machine translation by paraphrasing the training data. , 2008, IWSLT.