Unsupervised Aspect-Based Multi-Document Abstractive Summarization

User-generated reviews of products or services provide valuable information to customers. However, it is often impossible to read each of the potentially thousands of reviews: it would therefore save valuable time to provide short summaries of their contents. We address opinion summarization, a multi-document summarization task, with an unsupervised abstractive summarization neural system. Our system is based on (i) a language model that is meant to encode reviews to a vector space, and to generate fluent sentences from the same vector space (ii) a clustering step that groups together reviews about the same aspects and allows the system to generate summary sentences focused on these aspects. Our experiments on the Oposum dataset empirically show the importance of the clustering step.

[1]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Mirella Lapata,et al.  Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised , 2018, EMNLP.

[5]  Giovanni Semeraro,et al.  Centroid-based Text Summarization through Compositionality of Word Embeddings , 2017, MultiLing@EACL.

[6]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[7]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[8]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[9]  Eric Chu,et al.  MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization , 2018, ICML.

[10]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[11]  Hwee Tou Ng,et al.  An Unsupervised Neural Attention Model for Aspect Extraction , 2017, ACL.

[12]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[13]  Luis Argerich,et al.  Variations of the Similarity Function of TextRank for Automated Summarization , 2016, ArXiv.

[14]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[15]  Yllias Chali,et al.  Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion , 2018, COLING.

[16]  Demian Gholipour Ghalandari Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization , 2017, NFiS@EMNLP.