Closed-Book Training to Improve Summarization Encoder Memory

A good neural sequence-to-sequence summarization model should have a strong encoder that can distill and memorize the important information from long input texts so that the decoder can generate salient summaries based on the encoder’s memory. In this paper, we aim to improve the memorization capabilities of the encoder of a pointer-generator model by adding an additional ‘closed-book’ decoder without attention and pointer mechanisms. Such a decoder forces the encoder to be more selective in the information encoded in its memory state because the decoder can’t rely on the extra information provided by the attention and possibly copy modules, and hence improves the entire model. On the CNN/Daily Mail dataset, our 2-decoder model outperforms the baseline significantly in terms of ROUGE and METEOR metrics, for both cross-entropy and reinforced setups (and on human evaluation). Moreover, our model also achieves higher scores in a test-only DUC-2002 generalizability setup. We further present a memory ability test, two saliency metrics, as well as several sanity-check ablations (based on fixed-encoder, gradient-flow cut, and model capacity) to prove that the encoder of our 2-decoder model does in fact learn stronger memory representations than the baseline encoder.

[1]  Zhen-Hua Ling,et al.  Distraction-based neural networks for modeling documents , 2016, IJCAI 2016.

[2]  Jackie Chi Kit Cheung,et al.  Unsupervised Sentence Enhancement for Automatic Summarization , 2014, EMNLP.

[3]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[4]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[5]  Sanja Fidler,et al.  Efficient Summarization with Read-Again and Copy Mechanism , 2016, ArXiv.

[6]  Naoaki Okazaki,et al.  Neural Headline Generation on Abstract Meaning Representation , 2016, EMNLP.

[7]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[8]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[9]  Hua Wu,et al.  Multi-channel Encoder for Neural Machine Translation , 2017, AAAI.

[10]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[11]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[12]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[13]  Ramakanth Pasunuru,et al.  Multi-Reward Reinforced Summarization with Saliency and Entailment , 2018, NAACL.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[16]  Richard M. Schwartz,et al.  BBN/UMD at DUC-2004: Topiary , 2004 .

[17]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[20]  Iryna Gurevych,et al.  A Reinforcement Learning Approach for Adaptive Single- and Multi-Document Summarization , 2015, GSCL.

[21]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Siqi Liu,et al.  Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[26]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[27]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[28]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[29]  Qun Liu,et al.  Memory-enhanced Decoder for Neural Machine Translation , 2016, EMNLP.

[30]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[31]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[32]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[33]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[36]  Lukasz Kaiser,et al.  Sentence Compression by Deletion with LSTMs , 2015, EMNLP.

[37]  Giuseppe Carenini,et al.  Abstractive Summarization of Product Reviews Using Discourse Structure , 2014, EMNLP.

[38]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[39]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[40]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[41]  Γεώργιος Γιαννακόπουλος,et al.  Automatic Summarization from Multiple Documents , 2009 .

[42]  Dale Schuurmans,et al.  Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[43]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[44]  Hang Li,et al.  Guided Sequence-to-Sequence Learning with External Rule Memory , 2016 .