Document-level Neural Machine Translation with Inter-Sentence Attention

Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods only focus on briefly introducing document-level information but fail to concern about selecting the most related part inside document context. The capacity of memory network for detecting the most relevant part of the current sentence from the memory provides a natural solution for the requirement of modeling document-level context by document-level NMT. In this work, we propose a Transformer NMT system with associated memory network (AMN) to both capture the document-level context and select the most salient part related to the concerned translation from the memory. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.

[1]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[2]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[3]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[4]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[5]  Gholamreza Haffari,et al.  Document Context Neural Machine Translation with Memory Networks , 2017, ACL.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[8]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.

[9]  James Henderson,et al.  Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[10]  Huanbo Luan,et al.  Improving the Transformer Translation Model with Document-Level Context , 2018, EMNLP.

[11]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[12]  Walter Daelemans,et al.  Introduction to the special issue on memory-based language processing , 1999, J. Exp. Theor. Artif. Intell..

[13]  Guodong Zhou,et al.  Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[14]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  Hai Zhao,et al.  Semantic Role Labeling with Associated Memory Network , 2019, NAACL-HLT.

[17]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[18]  Jindrich Libovický,et al.  Attention Strategies for Multi-Source Sequence-to-Sequence Learning , 2017, ACL.

[19]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[20]  Graham Neubig,et al.  Extreme Adaptation for Personalized Neural Machine Translation , 2018, ACL.

[21]  Philipp Koehn,et al.  Findings of the 2013 Workshop on Statistical Machine Translation , 2013, WMT@ACL.

[22]  Royal Skousen,et al.  Analogical Modeling Of Language , 1989 .

[23]  Royal Skousen Analogy and Structure , 1992, Springer Netherlands.

[24]  Michael Lebowitz,et al.  Memory-Based Parsing , 1983, Artif. Intell..

[25]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[26]  Deyi Xiong,et al.  Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model , 2018, COLING.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[29]  Jörg Tiedemann,et al.  Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[30]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[31]  Orhan Firat,et al.  Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[32]  Yang Liu,et al.  THUMT: An Open-Source Toolkit for Neural Machine Translation , 2017, AMTA.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[35]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.