NUS at WMT09: Domain Adaptation Experiments for English-Spanish Machine Translation of News Commentary Text

We describe the system developed by the team of the National University of Singapore for English to Spanish machine translation of News Commentary text for the WMT09 Shared Translation Task. Our approach is based on domain adaptation, combining a small in-domain News Commentary bi-text and a large out-of-domain one from the Europarl corpus, from which we built and combined two separate phrase tables. We further combined two language models (in-domain and out-of-domain), and we experimented with cognates, improved tokenization and recasing, achieving the highest lowercased NIST score of 6.963 and the second best lowercased Bleu score of 24.91% for training without using additional external data for English-to-Spanish translation at the shared task.

[1]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[2]  I. Dan Melamed,et al.  Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons , 1995, VLC@ACL.

[3]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[4]  Preslav Nakov,et al.  Improving English-Spanish Statistical Machine Translation: Experiments in Domain Adaptation, Sentence Paraphrasing, Tokenization, and Recasing , 2008, WMT@ACL.

[5]  Daniel Marcu,et al.  Cognates Can Improve Statistical Translation Models , 2003, NAACL.

[6]  David Yarowsky,et al.  Multipath Translation Lexicon Induction via Bridge Languages , 2001, NAACL.

[7]  J. Albert Tuggy David Bickford Electronic Glossary of Linguistic Terms (with equivalent terms in Spanish) , 2002 .

[8]  Philipp Koehn,et al.  (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.

[9]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[10]  Philipp Koehn,et al.  Manual and Automatic Evaluation of Machine Translation between European Languages , 2006, WMT@HLT-NAACL.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Preslav Nakov,et al.  Improved Word Alignments Using the Web as a Corpus , 2007 .

[13]  Preslav Nakov,et al.  UCB System Description for the WMT 2007 Shared Task , 2007, WMT@ACL.

[14]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[15]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[16]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[19]  NeyHermann,et al.  A systematic comparison of various statistical alignment models , 2003 .

[20]  Grzegorz Kondrak,et al.  Alignment-Based Discriminative String Similarity , 2007, ACL.

[21]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[22]  David Yarowsky,et al.  Statistical Machine Translation: Final Report , 1999 .

[23]  I. Dan Melamed,et al.  Bitext Maps and Alignment via Pattern Recognition , 1999, CL.