Context-Aware Self-Attention Networks

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the contextual information, which have proven useful for modeling dependencies among neural representations in various natural language tasks. In this work, we focus on improving self-attention networks through capturing the richness of context. To maintain the simplicity and flexibility of the self-attention networks, we propose to contextualize the transformations of the query and key layers, which are used to calculates the relevance between elements. Specifically, we leverage the internal representations that embed both global and deep contexts, thus avoid relying on external resources. Experimental results on WMT14 English-German and WMT17 Chinese-English translation tasks demonstrate the effectiveness and universality of the proposed methods. Furthermore, we conducted extensive analyses to quantity how the context vectors participate in the self-attention model.

[1]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[2]  Jingbo Zhu,et al.  Towards Bidirectional Hierarchical Representations for Attention-based Neural Machine Translation , 2017, EMNLP.

[3]  Rico Sennrich,et al.  Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.

[4]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[5]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[6]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[7]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[8]  David Chiang,et al.  Tied Multitask Learning for Neural Speech Translation , 2018, NAACL.

[9]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[10]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[11]  Yutaka Matsuo,et al.  Deep contextualized word representations for detecting sarcasm and irony , 2018, WASSA@EMNLP.

[12]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.

[13]  Hai Zhao,et al.  Attention Is All You Need for Chinese Word Segmentation , 2019, EMNLP.

[14]  Yoshua Bengio,et al.  Context-dependent word representation for neural machine translation , 2016, Comput. Speech Lang..

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Jian Li,et al.  Multi-Head Attention with Disagreement Regularization , 2018, EMNLP.

[17]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Shuming Shi,et al.  Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[19]  Zhaopeng Tu,et al.  Convolutional Self-Attention Networks , 2019, NAACL.

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Andy Way,et al.  Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism , 2018, EMNLP.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Cécile Paris,et al.  Cross-Target Stance Classification with Self-Attention Networks , 2018, ACL.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[26]  Deyi Xiong,et al.  Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.

[27]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[28]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[29]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[30]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jitendra Malik,et al.  Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Nick Campbell,et al.  Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.

[33]  Andy Way,et al.  Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[34]  Deyi Xiong,et al.  A Context-Aware Recurrent Encoder for Neural Machine Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[35]  Shuming Shi,et al.  Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement , 2019, AAAI.

[36]  Matthias Sperber,et al.  Self-Attentional Acoustic Models , 2018, INTERSPEECH.

[37]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Rico Sennrich,et al.  Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.

[39]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[40]  Shuming Shi,et al.  Exploiting Deep Representations for Neural Machine Translation , 2018, EMNLP.

[41]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[42]  Zhaopeng Tu,et al.  Convolutional Self-Attention Network , 2018, ArXiv.