A Deep Reinforced Model for Abstractive Summarization

Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit "exposure bias" - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.

[1]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[2]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[5]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Yasemin Altun,et al.  Overcoming the Lack of Parallel Data in Sentence Compression , 2013, EMNLP.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Ani Nenkova,et al.  Detecting Information-Dense Texts in Multiple News Domains , 2014, AAAI.

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Kai Hong,et al.  Improving the Estimation of Word Importance for News Multi-Document Summarization , 2014, EACL.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Kai Hong,et al.  System Combination for Multi-document Summarization , 2015, EMNLP.

[15]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[16]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Fabrizio Silvestri,et al.  HEADS: Headline Generation as Sequence Prediction Using an Abstract Feature-Rich Space , 2015, NAACL.

[19]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[20]  Ani Nenkova,et al.  Identification and Characterization of Newsworthy Verbs in World News , 2022 .

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[25]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[26]  Dale Schuurmans,et al.  Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[27]  Junyi Jessy Li,et al.  The Role of Discourse Units in Near-Extractive Summarization , 2016, SIGDIAL Conference.

[28]  Yaser Al-Onaizan,et al.  Temporal Attention Model for Neural Machine Translation , 2016, ArXiv.

[29]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[30]  Sanja Fidler,et al.  Efficient Summarization with Read-Again and Copy Mechanism , 2016, ArXiv.

[31]  Dan Klein,et al.  Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints , 2016, ACL.

[32]  Zhen-Hua Ling,et al.  Distraction-based neural networks for modeling documents , 2016, IJCAI 2016.

[33]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[34]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[35]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[36]  Wang Ling,et al.  Latent Predictor Networks for Code Generation , 2016, ACL.

[37]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[38]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[39]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[40]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[41]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[42]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[43]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .