Reader-Aware Multi-Document Summarization: An Enhanced Model and The First Dataset

We investigate the problem of reader-aware multi-document summarization (RA-MDS) and introduce a new dataset for this problem. To tackle RA-MDS, we extend a variational auto-encodes (VAEs) based MDS framework by jointly considering news documents and reader comments. To conduct evaluation for summarization performance, we prepare a new dataset. We describe the methods for data collection, aspect annotation, and summary writing as well as scrutinizing by experts. Experimental results show that reader comments can improve the summarization performance, which also demonstrates the usefulness of the proposed dataset. The annotated dataset for RA-MDS is available online.

[1]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[2]  Chew Lim Tan,et al.  Exploiting Category-Specific Information for Multi-Document Summarization , 2012, COLING.

[3]  Piji Li,et al.  Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization , 2017, AAAI.

[4]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[5]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[6]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[9]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[12]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[13]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[14]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[15]  George B. Dantzig,et al.  Linear Programming 1: Introduction , 1997 .

[16]  Piji Li,et al.  Abstractive Multi-Document Summarization via Phrase Selection and Merging , 2015, ACL.

[17]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[18]  Hang Li,et al.  Reader-Aware Multi-Document Summarization via Sparse Coding , 2015, IJCAI.

[19]  Mark Wasson,et al.  Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications , 1998, ACL.

[20]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.