A Semantic Evaluation of Machine Translation Lexical Choice

While automatic metrics of translation quality are invaluable for machine translation research, deeper understanding of translation errors require more focused evaluations designed to target specific aspects of translation quality. We show that Word Sense Disambiguation (WSD) can be used to evaluate the quality of machine translation lexical choice, by applying a standard phrase-based SMT system on the SemEval2010 Cross-Lingual WSD task. This case study reveals that the SMT system does not perform as well as a WSD system trained on the exact same parallel data, and that local context models based on source phrases and target n-grams are much weaker representations of context than the simple templates used by the WSD system.

[1]  Marine Carpuat,et al.  NRC: A Machine Translation Approach to Cross-Lingual Word Sense Disambiguation (SemEval-2013 Task 10) , 2013, *SEMEVAL.

[2]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[3]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[4]  Dekai Wu,et al.  Fully Automatic Semantic MT Evaluation , 2012, WMT@NAACL-HLT.

[5]  Lluís Màrquez i Villodre,et al.  Linguistic Features for Automatic Evaluation of Heterogenous MT Systems , 2007, WMT@ACL.

[6]  Sara Stymne,et al.  Blast: A Tool for Error Analysis of Machine Translation Output , 2011, ACL.

[7]  Dekai Wu,et al.  MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles , 2011, ACL.

[8]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[9]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[10]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[11]  Martine De Cock,et al.  ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation , 2011, ACL.

[12]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[13]  Véronique Hoste,et al.  SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation , 2010, SemEval@ACL.

[14]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[15]  Kevin Knight,et al.  11,001 New Features for Statistical Machine Translation , 2009, NAACL.

[16]  Alexander M. Fraser,et al.  Domain Adaptation in Machine Translation : Final Report , 2013 .

[17]  Alon Lavie,et al.  METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Peng Jin,et al.  SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample , 2007, SemEval@ACL.

[20]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[21]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[22]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[23]  Daniel Marcu,et al.  HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.

[24]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[25]  Christiane Fellbaum,et al.  English Tasks: All-Words and Verb Lexical Sample , 2001, *SEMEVAL.

[26]  Roland Kuhn,et al.  PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning , 2012, ACL.

[27]  David Yarowsky,et al.  Combining Classifiers for word sense disambiguation , 2002, Nat. Lang. Eng..

[28]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[29]  Marc Dymetman,et al.  Learning Machine Translation , 2010 .

[30]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[31]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[32]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[33]  Maarten van Gompel,et al.  UvT-WSD1: A Cross-Lingual Word Sense Disambiguation System , 2010, SemEval@ACL.

[34]  Véronique Hoste,et al.  SemEval-2013 Task 10: Cross-lingual Word Sense Disambiguation , 2013, *SEMEVAL.

[35]  Технология Springer Science+Business Media , 2013 .

[36]  Marine Carpuat,et al.  Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation , 2005, IJCNLP.

[37]  Jan Niehues,et al.  Wider Context by Using Bilingual Language Models in Machine Translation , 2011, WMT@EMNLP.

[38]  Hermann Ney,et al.  Error Analysis of Statistical Machine Translation Output , 2006, LREC.

[39]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[40]  Chris Callison-Burch,et al.  Machine Translation of Arabic Dialects , 2012, NAACL.

[41]  George F. Foster,et al.  Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables , 2011, MTSUMMIT.

[42]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[43]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[44]  Ted Pedersen,et al.  The Senseval-3 Multilingual English-­Hindi lexical sample task , 2004, SENSEVAL@ACL.

[45]  Philipp Koehn,et al.  Feature-Rich Statistical Translation of Noun Phrases , 2003, ACL.

[46]  Marine Carpuat,et al.  Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation , 2008, LREC.

[47]  Rada Mihalcea,et al.  SemEval-2010 Task 2: Cross-Lingual Lexical Substitution , 2009, SemEval@ACL.