Fear the REAPER: A System for Automatic Multi-Document Summarization with Reinforcement Learning

This paper explores alternate algorithms, reward functions and feature sets for performing multi-document summarization using reinforcement learning with a high focus on reproducibility. We show that ROUGE results can be improved using a unigram and bigram similarity metric when training a learner to select sentences for summarization. Learners are trained to summarize document clusters based on various algorithms and reward functions and then evaluated using ROUGE. Our experiments show a statistically significant improvement of 1.33%, 1.58%, and 2.25% for ROUGE-1, ROUGE-2 and ROUGEL scores, respectively, when compared with the performance of the state of the art in automatic summarization with reinforcement learning on the DUC2004 dataset. Furthermore query focused extensions of our approach show an improvement of 1.37% and 2.31% for ROUGE-2 and ROUGE-SU4 respectively over query focused extensions of the state of the art with reinforcement learning on the DUC2006 dataset.

[1]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[2]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[3]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Sadid A. Hasan,et al.  On the Effectiveness of using Sentence Compression Models for Query-Focused Multi-Document Summarization , 2012, COLING.

[6]  Takeshi Abekawa,et al.  Framework of Automatic Text Summarization Using Reinforcement Learning , 2012, EMNLP-CoNLL.

[7]  Sadid A. Hasan,et al.  A reinforcement learning framework for answering complex questions , 2011, IUI '11.

[8]  Sadid A. Hasan,et al.  Improving the performance of the reinforcement learning model for answering complex questions , 2012, CIKM '12.

[9]  Shafiq R. Joty,et al.  Improving graph-based random walks for complex question answering using syntactic, shallow semantic and extended string subsequence kernels , 2011, Inf. Process. Manag..

[10]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Sadid A. Hasan,et al.  Query-focused multi-document summarization: automatic data annotations and supervised learning approaches , 2011, Natural Language Engineering.

[13]  Michail G. Lagoudakis,et al.  Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.

[14]  Shafiq R. Joty,et al.  Complex Question Answering: Unsupervised Learning Approaches and Experiments , 2009, J. Artif. Intell. Res..

[15]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Yang Liu,et al.  Fast Joint Compression and Summarization via Graph Cuts , 2013, EMNLP.

[17]  Chin-Yew Lin Improving summarization performance by sentence compression: a pilot study , 2003, IRAL.

[18]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[19]  Jiwei Li,et al.  A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization , 2012, TACL.

[20]  Kushal Dave,et al.  Towards Summarization of Written Text Conversations , 2013 .

[21]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[22]  Shafiq R. Joty,et al.  Do Automatic Annotation Techniques Have Any Impact on Supervised Complex Question Answering? , 2009, ACL/IJCNLP.

[23]  Steffen slyngbae Lyngbaek SPORK: A SUMMARIZATION PIPELINE FOR ONLINE REPOSITORIES OF KNOWLEDGE , 2013 .

[24]  Fan Zhang,et al.  Query-focused multi-document summarization based on query-sensitive feature space , 2012, CIKM.

[25]  Yong Yu,et al.  Understanding and Summarizing Answers in Community-Based Question Answering Services , 2008, COLING.

[26]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[27]  John M. Conroy Left-Brain/Right-Brain Multi-Document Summarization , 2004 .

[28]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[29]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[30]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.