Diverse Pretrained Context Encodings Improve Document Translation

We propose a new architecture for adapting a sentence-level sequence-to-sequence transformer by incorporating multiple pretrained document context signals and assess the impact on translation performance of (1) different pretraining approaches for generating these signals, (2) the quantity of parallel data for which document context is available, and (3) conditioning on source, target, or source and target contexts. Experiments on the NIST Chinese–English, and IWSLT and WMT English–German tasks support four general conclusions: that using pretrained context representations markedly improves sample efficiency, that adequate parallel data resources are crucial for learning to use document context, that jointly conditioning on multiple context representations outperforms any single representation, and that source context is more valuable for translation performance than target side context. Our best multicontext model consistently outperforms the best existing context-aware transformers.

[1]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[2]  Alexandros Potamianos,et al.  An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models , 2019, NAACL.

[3]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[4]  Hermann Ney,et al.  When and Why is Document-level Context Useful in Neural Machine Translation? , 2019, EMNLP.

[5]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[6]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[7]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[8]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[9]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[10]  Srivatsan Srinivasan,et al.  The DeepMind Chinese–English Document Translation System at WMT2020 , 2020, WMT.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[13]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[14]  Jingbo Zhu,et al.  Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation , 2020, ACL.

[15]  Marcin Junczys-Dowmunt,et al.  Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.

[16]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[17]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[18]  Alexander M. Rush,et al.  Encoder-Agnostic Adaptation for Conditional Language Generation , 2019, ArXiv.

[19]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[20]  Rachel Bawden,et al.  Document-level Neural MT: A Systematic Comparison , 2020, EAMT.

[21]  Hermann Ney,et al.  Diving Deep into Context-Aware Neural Machine Translation , 2020, WMT.

[22]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[23]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[24]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[25]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[26]  Rico Sennrich,et al.  Context-Aware Monolingual Repair for Neural Machine Translation , 2019, EMNLP.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Sergey Edunov,et al.  Pre-trained language model representations for language generation , 2019, NAACL.

[31]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[32]  Luke de Oliveira,et al.  Repurposing Decoder-Transformer Language Models for Abstractive Summarization , 2019, ArXiv.

[33]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[34]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[35]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[36]  Chris Dyer,et al.  Better Document-Level Machine Translation with Bayes’ Rule , 2019, Transactions of the Association for Computational Linguistics.

[37]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[38]  Tie-Yan Liu,et al.  Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[39]  Ji Wang,et al.  Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[40]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[41]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.