Conditional Self-Attention for Query-based Summarization

Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose \textit{conditional self-attention} (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

[1]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[2]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[3]  Michael Elhadad,et al.  Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models , 2018, ArXiv.

[4]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[7]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[8]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[9]  Mikael Kågebäck,et al.  Query-Based Abstractive Summarization Using Neural Networks , 2017, ArXiv.

[10]  Furu Wei,et al.  AttSum: Joint Learning of Focusing and Summarization with Neural Attention , 2016, COLING.

[11]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[12]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[13]  Sun Park,et al.  Query Based Summarization Using Non-negative Matrix Factorization , 2006, KES.

[14]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[15]  Xuanjing Huang,et al.  Using query expansion in graph-based approach for query-focused multi-document summarization , 2009, Inf. Process. Manag..

[16]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[17]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[18]  Zhen-Hua Ling,et al.  Enhancing Sentence Embedding with Generalized Pooling , 2018, COLING.

[19]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[20]  Balaraman Ravindran,et al.  Diversity driven attention model for query-based abstractive summarization , 2017, ACL.

[21]  Yuxing Peng,et al.  Reinforced Mnemonic Reader for Machine Comprehension , 2017 .

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Ming Zhou,et al.  Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[24]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Chengqi Zhang,et al.  Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling , 2018, ICLR.

[27]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[28]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[29]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.