A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018

We evaluate the output of 16 English-toGerman MT systems with respect to the translation of pronouns in the context of the WMT 2018 competition. We work with a test suite specifically designed to assess system quality in various fine-grained categories known to be problematic. The main evaluation scores come from a semi-automatic process, combining automatic reference matching with extensive manual annotation of uncertain cases. We find that current NMT systems are good at translating pronouns with intra-sentential reference, but the inter-sentential cases remain difficult. NMT systems are also good at the translation of event pronouns, unlike systems from the phrase-based SMT paradigm. No single system performs best at translating all types of anaphoric pronouns, suggesting unexplained random effects influencing the translation of pronouns with NMT.

[1]  Laura Mascarell,et al.  Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings , 2017, WMT.

[2]  Rico Sennrich,et al.  How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[3]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[4]  Karin Sim Smith On Integrating Discourse in Machine Translation , 2017, DiscoMT@EMNLP.

[5]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[6]  Preslav Nakov,et al.  Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction , 2016, WMT.

[7]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[8]  Preslav Nakov,et al.  Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation , 2015, DiscoMT@EMNLP.

[9]  Christian Hardmeier,et al.  On Statistical Machine Translation and Translation Theory , 2015, DiscoMT@EMNLP.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[12]  Liane Guillou,et al.  Pronoun Translation in English-French Machine Translation: An Analysis of Error Types , 2018, ArXiv.

[13]  Andrei Popescu-Belis,et al.  Validation of an Automatic Metric for the Accuracy of Pronoun Translation (APT) , 2017, DiscoMT@EMNLP.

[14]  François Yvon,et al.  Evaluating the morphological competence of Machine Translation Systems , 2017, WMT.

[15]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation : A Survey and a Case Study , 2012 .

[16]  Dan Klein,et al.  An Empirical Investigation of Statistical Significance in NLP , 2012, EMNLP.

[17]  Orhan Firat,et al.  Neural Machine Translation for Cross-Lingual Pronoun Prediction , 2017, DiscoMT@EMNLP.

[18]  Jörg Tiedemann,et al.  ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT , 2014, LREC.

[19]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[20]  Liane Guillou,et al.  Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point , 2018, EMNLP.

[21]  Preslav Nakov,et al.  Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction , 2017, DiscoMT@EMNLP.

[22]  Liane Guillou,et al.  PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation , 2016, LREC.

[23]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Liane Guillou,et al.  A Graphical Pronoun Analysis Tool for the PROTEST Pronoun Evaluation Test Suite , 2016, EAMT.

[27]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[28]  Preslav Nakov,et al.  DiscoMT 2015 Shared Task on Pronoun Translation , 2016 .

[29]  Christian Hardmeier,et al.  ParCorFull: a Parallel Corpus Annotated with Full Coreference , 2018, LREC.