Improve SMT Quality with Automatically Extracted Paraphrase Rules

We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side. Besides the word and phrase paraphrases, the acquired paraphrase rules mainly cover the structured paraphrases on the sentence level. These rules are employed to enrich the SMT inputs for translation quality improvement. The experimental results show that our proposed approach achieves significant improvements of 1.6~3.6 points of BLEU in the oral domain and 0.5~1 points in the news domain.

[1]  Reinhard Rapp The Backtranslation Score: Automatic MT Evalution at the Sentence Level without Reference Translations , 2009, ACL/IJCNLP.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[4]  Yanli Sun,et al.  A Novel Statistical Pre-Processing Model for Rule-Based Machine Translation System , 2010, EAMT.

[5]  Preslav Nakov,et al.  Improved Statistical Machine Translation Using Monolingual Paraphrases , 2008, ECAI.

[6]  Aurélien Max,et al.  Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation , 2010, EMNLP.

[7]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[8]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.

[9]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[10]  Eric Nichols,et al.  Improving statistical machine translation by paraphrasing the training data. , 2008, IWSLT.

[11]  Andy Way,et al.  Facilitating Translation Using Source Language Paraphrase Lattices , 2010, EMNLP.

[12]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Roland Kuhn,et al.  Phrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables , 2010, COLING.

[15]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Masao Utiyama,et al.  Paraphrase Lattice for Statistical Machine Translation , 2010, ACL.

[18]  Lucia Specia,et al.  Source-Language Entailment Modeling for Translating Unknown Terms , 2009, ACL.

[19]  Wei He,et al.  Enriching SMT Training Data via Paraphrasing , 2011, IJCNLP.

[20]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[21]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.