Divide and Rule: Training Context-Aware Multi-Encoder Translation Models with Little Resources

Multi-encoder models are a broad family of context-aware Neural Machine Translation (NMT) systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on documentlevel data. In this work, we show that training these parameters takes large amount of data, since the contextual training signal is sparse. We propose an efficient alternative, based on splitting sentence pairs, that allows to enrich the training signal of a set of parallel sentences by breaking intra-sentential syntactic links, and thus frequently pushing the model to search the context for disambiguating clues. We evaluate our approach with BLEU and contrastive test sets, showing that it allows multi-encoder models to achieve comparable performances to a setting where they are trained with ×10 document-level data. We also show that our approach is a viable option to context-aware NMT for language pairs with zero document-level parallel data.

[1]  Hermann Ney,et al.  When and Why is Document-level Context Useful in Neural Machine Translation? , 2019, EMNLP.

[2]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[3]  Gholamreza Haffari,et al.  A Survey on Document-level Machine Translation: Methods and Evaluation , 2019, ArXiv.

[4]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ArXiv.

[5]  Andrei Popescu-Belis,et al.  Context in Neural Machine Translation: A Review of Models and Evaluations , 2019, ArXiv.

[6]  Gholamreza Haffari,et al.  Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations , 2018, WMT.

[7]  Rico Sennrich,et al.  Context-Aware Monolingual Repair for Neural Machine Translation , 2019, EMNLP.

[8]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[9]  Yang Zhao,et al.  Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning , 2020, EMNLP.

[10]  Matteo Negri,et al.  Contextual Handling in Neural Machine Translation: Look behind, ahead and on both sides , 2018, EAMT.

[11]  Jörg Tiedemann,et al.  OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora , 2018, LREC.

[12]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[13]  Naoki Yoshinaga,et al.  Data augmentation using back-translation for context-aware neural machine translation , 2019, EMNLP.

[14]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[15]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[16]  Rico Sennrich,et al.  A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation , 2018, WMT.

[17]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation : A Survey and a Case Study , 2012 .

[18]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[19]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[20]  Markus Freitag,et al.  BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.

[21]  Ondrej Bojar,et al.  Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.

[22]  Jörg Tiedemann,et al.  Analysing concatenation approaches to document-level NMT in two different domains , 2019, EMNLP.

[23]  Rachel Bawden,et al.  Document-level Neural MT: A Systematic Comparison , 2020, EAMT.

[24]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[25]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[26]  Rico Sennrich,et al.  When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion , 2019, ACL.

[27]  Gholamreza Haffari,et al.  Contextual Neural Machine Translation Improves Translation of Cataphoric Pronouns , 2020, ACL.

[28]  Kai Fan,et al.  Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation , 2020, EMNLP.

[29]  Hermann Ney,et al.  Diving Deep into Context-Aware Neural Machine Translation , 2020, WMT.

[30]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[31]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[32]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[33]  Alexander M. Fraser,et al.  ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation , 2020, COLING.

[34]  Guodong Zhou,et al.  Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation , 2019, EMNLP.

[35]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[36]  Guodong Zhou,et al.  Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[37]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[38]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[39]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[40]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation , 2014 .

[41]  Jingbo Zhu,et al.  Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation , 2020, ACL.

[42]  Marcin Junczys-Dowmunt,et al.  Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.

[43]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[44]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[45]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.