Using Paraphrases for Parameter Tuning in Statistical Machine Translation

Most state-of-the-art statistical machine translation systems use log-linear models, which are defined in terms of hypothesis features and weights for those features. It is standard to tune the feature weights in order to maximize a translation quality metric, using held-out test sentences and their corresponding reference translations. However, obtaining reference translations is expensive. In this paper, we introduce a new full-sentence paraphrase technique, based on English-to-English decoding with an MT system, and we demonstrate that the resulting paraphrases can be used to drastically reduce the number of human reference translations needed for parameter tuning, without a significant decrease in translation quality.

[1]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[2]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[3]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.

[4]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[7]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[8]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[9]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[10]  Daniel Marcu,et al.  Syntax-based Alignment of Multiple Translations: Extracting Paraphrases and Generating New Sentences , 2003, NAACL.

[11]  Douglas W. Oard,et al.  The surprise language exercises , 2003, TALIP.

[12]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[13]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[14]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[15]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[16]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[17]  Alon Lavie,et al.  MT for Minority Languages Using Elicitation-Based Learning of Syntactic Transfer Rules , 2002, Machine Translation.

[18]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[19]  Jimmy J. Lin,et al.  A Paraphrase-Based Approach to Machine Translation Evaluation , 2005 .

[20]  Nicola Ueffing,et al.  Using monolingual source-language data to improve MT performance , 2006, IWSLT.

[21]  Dragos Stefan Munteanu,et al.  ParaEval: Using Paraphrases to Evaluate Summaries Automatically , 2006, NAACL.

[22]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[23]  Tong Zhang,et al.  A Discriminative Global Training Algorithm for Statistical MT , 2006, ACL.

[24]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[25]  Mark Liberman,et al.  Integrated Linguistic Resources for Language Exploitation Technologies , 2006, LREC.

[26]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[27]  Adam Lopez,et al.  Statistical machine translation , 2007, CSUR.