PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation

We present PROTEST, a test suite for the evaluation of pronoun translation by MT systems. The test suite comprises 250 hand-selected pronoun tokens and an automatic evaluation method which compares the translations of pronouns in MT output with those in the reference translation. Pronoun translations that do not match the reference are referred for manual evaluation. PROTEST is designed to support analysis of system performance at the level of individual pronoun groups, rather than to provide a single aggregate measure over all pronouns. We wish to encourage detailed analyses to highlight issues in the handling of specific linguistic mechanisms by MT systems, thereby contributing to a better understanding of those problems involved in translating pronouns. We present two use cases for PROTEST: a) for measuring improvement/degradation of an incremental system change, and b) for comparing the performance of a group of systems whose design may be largely unrelated. Following the latter use case, we demonstrate the application of PROTEST to the evaluation of the systems submitted to the DiscoMT 2015 shared task on pronoun translation.

[1]  Preslav Nakov,et al.  DiscoMT 2015 Shared Task on Pronoun Translation , 2016 .

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Liane Guillou,et al.  Improving Pronoun Translation for Statistical Machine Translation , 2012, EACL.

[4]  Andrei Popescu-Belis,et al.  Pronoun Translation and Prediction with or without Coreference Links , 2015, DiscoMT@EMNLP.

[5]  Preslav Nakov,et al.  Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation , 2015, DiscoMT@EMNLP.

[6]  Christian Hardmeier,et al.  On Statistical Machine Translation and Translation Theory , 2015, DiscoMT@EMNLP.

[7]  Sharid Loáiciga,et al.  Rule-Based Pronominal Anaphora Treatment for Machine Translation , 2015, DiscoMT@EMNLP.

[8]  Michal Novák,et al.  Two Case Studies on Translating Pronouns in a Deep Syntax Framework , 2013, IJCNLP.

[9]  Marcello Federico,et al.  Modelling pronominal anaphora in statistical machine translation , 2010, IWSLT.

[10]  Jörg Tiedemann Baseline Models for Pronoun Prediction and Pronoun-Aware Translation , 2015, DiscoMT@EMNLP.

[11]  Andrei Popescu-Belis,et al.  Assessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric , 2013, CICLing.

[12]  Liane Guillou Automatic Post-Editing for the DiscoMT Pronoun Translation Task , 2015, DiscoMT@EMNLP.

[13]  Jörg Tiedemann,et al.  ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT , 2014, LREC.

[14]  Liane Guillou Improving Pronoun Translation for Statistical Machine Translation (SMT) , 2011 .