论文信息 - Compositional De-Attention Networks - 字舞流文

Compositional De-Attention Networks

Attentional models are distinctly characterized by their ability to learn relative importance, i.e., assigning a different weight to input values. This paper proposes a new quasi-attention that is compositional in nature, i.e., learning whether to \textit{add}, \textit{subtract} or \textit{nullify} a certain vector when learning representations. This is strongly contrasted with vanilla attention, which simply re-weights input tokens. Our proposed \textit{Compositional De-Attention} (CoDA) is fundamentally built upon the intuition of both similarity and dissimilarity (negative affinity) when computing affinity scores, benefiting from a greater extent of expressiveness. We evaluate CoDA on six NLP tasks, i.e. open domain question answering, retrieval/ranking, natural language inference, machine translation, sentiment analysis and text2code generation. We obtain promising experimental results, achieving state-of-the-art performance on several tasks/datasets.

Siu Cheung Hui | Shuohang Wang | Yi Tay | Aston Zhang | Anh Tuan Luu | Yi Tay | Shuohang Wang | Aston Zhang | Anh Tuan Luu | S. Hui

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.

[3] Jian Su,et al. Densely Connected Attention Propagation for Reading Comprehension , 2018, NeurIPS.

[4] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5] Noah A. Smith,et al. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[6] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[7] Rajarshi Das,et al. Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.

[8] Artit Wangperawong. Attending to Mathematical Language with Transformers , 2018, ArXiv.

[9] Mitesh M. Khapra,et al. On Controllable Sparse Alternatives to Softmax , 2018, NeurIPS.

[10] William W. Cohen,et al. Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[11] Wei Zhang,et al. R3: Reinforced Reader-Ranker for Open-Domain Question Answering , 2017, ArXiv.

[12] Jannis Bulian,et al. Ask the Right Questions: Active Question Reformulation with Reinforcement Learning , 2017, ICLR.

[13] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[14] Illia Polosukhin,et al. Neural Program Search: Solving Programming Tasks from Description and Examples , 2018, ICLR.

[15] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[16] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[18] Jason Weston,et al. Dialogue Natural Language Inference , 2018, ACL.

[19] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[21] Chong Wang,et al. Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[22] Yi Yang,et al. WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[23] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[24] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[25] Chris Dyer,et al. Neural Arithmetic Logic Units , 2018, NeurIPS.

[26] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[27] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.