Toward Making the Most of Context in Neural Machine Translation

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new documentlevel NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2.1 BLEU on state-ofthe-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.

[1]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[2]  Marcin Junczys-Dowmunt,et al.  Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation , 2019, WMT.

[3]  Rico Sennrich,et al.  Context-Aware Monolingual Repair for Neural Machine Translation , 2019, EMNLP.

[4]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[5]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[6]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[7]  Rico Sennrich,et al.  Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation , 2018, EMNLP.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Gholamreza Haffari,et al.  Selective Attention for Context-aware Neural Machine Translation , 2019, NAACL.

[10]  Rico Sennrich,et al.  A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation , 2018, WMT.

[11]  Guodong Zhou,et al.  Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Yang Feng,et al.  Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation , 2019, EMNLP/IJCNLP.

[14]  Deyi Xiong,et al.  Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model , 2018, COLING.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Guodong Zhou,et al.  Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation , 2019, EMNLP.

[18]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[19]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[20]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[21]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[22]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[25]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Hua Wu,et al.  Modeling Coherence for Discourse Neural Machine Translation , 2018, AAAI.

[27]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.