Pronoun Translation in English-French Machine Translation: An Analysis of Error Types

Pronouns are a long-standing challenge in machine translation. We present a study of the performance of a range of rule-based, statistical and neural MT systems on pronoun translation based on an extensive manual evaluation using the PROTEST test suite, which enables a fine-grained analysis of different pronoun types and sheds light on the difficulties of the task. We find that the rule-based approaches in our corpus perform poorly as a result of oversimplification, whereas SMT and early NMT systems exhibit significant shortcomings due to a lack of awareness of the functional and referential properties of pronouns. A recent Transformer-based NMT system with cross-sentence context shows very promising results on non-anaphoric pronouns and intra-sentential anaphora, but there is still considerable room for improvement in examples with cross-sentence dependencies.

[1]  Jörg Tiedemann,et al.  ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT , 2014, LREC.

[2]  Jörg Tiedemann Baseline Models for Pronoun Prediction and Pronoun-Aware Translation , 2015, DiscoMT@EMNLP.

[3]  Sharid Loáiciga,et al.  What is it? Disambiguating the different readings of the pronoun 'it' , 2017, EMNLP.

[4]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[5]  Preslav Nakov,et al.  Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation , 2015, DiscoMT@EMNLP.

[6]  Andrei Popescu-Belis,et al.  Pronoun Translation and Prediction with or without Coreference Links , 2015, DiscoMT@EMNLP.

[7]  Christian Hardmeier On Statistical Machine Translation and Translation Theory , 2015, DiscoMT@EMNLP.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Sharid Loáiciga,et al.  Rule-Based Pronominal Anaphora Treatment for Machine Translation , 2015, DiscoMT@EMNLP.

[10]  Marcello Federico,et al.  Modelling pronominal anaphora in statistical machine translation , 2010, IWSLT.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[13]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15]  Liane Guillou Automatic Post-Editing for the DiscoMT Pronoun Translation Task , 2015, DiscoMT@EMNLP.

[16]  Andrei Popescu-Belis,et al.  Proceedings of the Second Workshop on Discourse in Machine Translation , 2015 .

[17]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[18]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[19]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation , 2014 .

[20]  Philipp Koehn,et al.  Aiding Pronoun Translation with Co-Reference Resolution , 2010, WMT@ACL.

[21]  Preslav Nakov,et al.  DiscoMT 2015 Shared Task on Pronoun Translation , 2016 .

[22]  Liane Guillou,et al.  PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation , 2016, LREC.

[23]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[24]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[25]  Liane Guillou,et al.  A Graphical Pronoun Analysis Tool for the PROTEST Pronoun Evaluation Test Suite , 2016, EAMT.

[26]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[27]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[28]  Jörg Tiedemann,et al.  Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction , 2013, EMNLP.

[29]  Sharid Loaiciga Sanchez Pronominal anaphora and verbal tenses in machine translation , 2017 .

[30]  Liane Kirsten Guillou,et al.  Incorporating pronoun function into statistical machine translation , 2016 .