A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss

We propose a unified model combining the strength of extractive and abstractive summarization. On the one hand, a simple extractive model can obtain sentence-level attention with high ROUGE scores but less readable. On the other hand, a more complicated abstractive model can obtain word-level dynamic attention to generate a more readable paragraph. In our model, sentence-level attention is used to modulate the word-level attention such that words in less attended sentences are less likely to be generated. Moreover, a novel inconsistency loss function is introduced to penalize the inconsistency between two levels of attentions. By end-to-end training our model with the inconsistency loss and original losses of extractive and abstractive models, we achieve state-of-the-art ROUGE scores while being the most informative and readable summarization on the CNN/Daily Mail dataset in a solid human evaluation.

[1]  Mirella Lapata,et al.  Neural Extractive Summarization with Side Information , 2017, ArXiv.

[2]  Zhen-Hua Ling,et al.  Distraction-based neural networks for modeling documents , 2016, IJCAI 2016.

[3]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[4]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[5]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[6]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[7]  Min Yang,et al.  Generative Adversarial Network for Abstractive Text Summarization , 2017, AAAI.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Angela Fan,et al.  Controllable Abstractive Summarization , 2017, NMT@ACL.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Bowen Zhou,et al.  Classify or Select: Neural Architectures for Extractive Document Summarization , 2016, ArXiv.

[12]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[13]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[14]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[17]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[18]  Wenpeng Yin,et al.  Optimizing Sentence Modeling and Selection for Document Summarization , 2015, IJCAI.

[19]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[20]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[21]  Devdatt P. Dubhashi,et al.  Extractive Summarization using Continuous Vector Space Models , 2014, CVSC@EACL.

[22]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[23]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[24]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.