Exploiting Sentential Context for Neural Machine Translation

In this work, we present novel approaches to exploit sentential context for neural machine translation (NMT). Specifically, we first show that a shallow sentential context extracted from the top encoder layer only, can improve translation performance via contextualizing the encoding representations of individual words. Next, we introduce a deep sentential context, which aggregates the sentential context representations from all the internal layers of the encoder to form a more comprehensive context representation. Experimental results on the WMT14 English-to-German and English-to-French benchmarks show that our model consistently improves performance over the strong TRANSFORMER model (Vaswani et al., 2017), demonstrating the necessity and effectiveness of exploiting sentential context for NMT.

[1]  Xing Wang,et al.  Context-Aware Self-Attention Networks , 2019, AAAI.

[2]  Jie Hao,et al.  Local Translation Prediction with Global Sentence Representation , 2015, IJCAI.

[3]  Di He,et al.  Dense Information Flow for Neural Machine Translation , 2018, NAACL.

[4]  Yang Liu,et al.  Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[5]  Ankur Bapna,et al.  Training Deeper Neural Machine Translation Models with Transparent Attention , 2018, EMNLP.

[6]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[7]  Roland Kuhn,et al.  Mixture-Model Adaptation for SMT , 2007, WMT@ACL.

[8]  Xing Wang,et al.  Modeling Recurrence for Transformer , 2019, NAACL.

[9]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[10]  Jingbo Zhu,et al.  Multi-layer Representation Fusion for Neural Machine Translation , 2018, COLING.

[11]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[12]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[13]  Xinyan Xiao,et al.  A Topic Similarity Model for Hierarchical Phrase-based Translation , 2012, ACL.

[14]  Xu Sun,et al.  Deconvolution-Based Global Decoding for Neural Machine Translation , 2018, COLING.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Shuming Shi,et al.  Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement , 2019, AAAI.

[17]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[18]  Eugene A. Nida,et al.  Science of Translation , 1969 .

[19]  Shuming Shi,et al.  Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.

[20]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[21]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[22]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[23]  Michael R. Lyu,et al.  Information Aggregation for Multi-Head Attention with Routing-by-Agreement , 2019, NAACL.

[24]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[25]  Min Zhang,et al.  Topic-Based Coherence Modeling for Statistical Machine Translation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[27]  Yoshua Bengio,et al.  Context-dependent word representation for neural machine translation , 2016, Comput. Speech Lang..

[28]  Zhaopeng Tu,et al.  Modeling Past and Future for Neural Machine Translation , 2017, TACL.

[29]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[30]  Philipp Koehn,et al.  Combining Domain and Topic Adaptation for SMT , 2014 .

[31]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[32]  Qun Liu,et al.  Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[33]  Andy Way,et al.  Topic-Informed Neural Machine Translation , 2016, COLING.

[34]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.