Multi-Document Summarization of Evaluative Text

In many decision‐making scenarios, people can benefit from knowing what other people's opinions are. As more and more evaluative documents are posted on the Web, summarizing these useful resources becomes a critical task for many organizations and individuals. This paper presents a framework for summarizing a corpus of evaluative documents about a single entity by a natural language summary. We propose two summarizers: an extractive summarizer and an abstractive one. As an additional contribution, we show how our abstractive summarizer can be modified to generate summaries tailored to a model of the user preferences that is solidly grounded in decision theory and can be effectively elicited from users. We have tested our framework in three user studies. In the first one, we compared the two summarizers. They performed equally well relative to each other quantitatively, while significantly outperforming a baseline standard approach to multidocument summarization. Trends in the results as well as qualitative comments from participants suggest that the summarizers have different strengths and weaknesses. After this initial user study, we realized that the diversity of opinions expressed in the corpus (i.e., its controversiality) might play a critical role in comparing abstraction versus extraction. To clearly pinpoint the role of controversiality, we ran a second user study in which we controlled for the degree of controversiality of the corpora that were summarized for the participants. The outcome of this study indicates that for evaluative text abstraction tends to be more effective than extraction, particularly when the corpus is controversial. In the third user study we assessed the effectiveness of our user tailoring strategy. The results of this experiment confirm that user tailored summaries are more informative than untailored ones.

[1]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[2]  Cecile L. Paris User Modeling in Text Generation , 1993 .

[3]  F. H. Barron,et al.  SMARTS and SMARTER: Improved Simple Methods for Multiattribute Utility Measurement , 1994 .

[4]  Anthony Jameson,et al.  Adaptive Provision of Evaluation-Oriented Information: Tasks and Techniques , 1995, IJCAI.

[5]  Bruce E. Barrett,et al.  Decision quality using ranked attribute weights , 1996 .

[6]  Robert T. Clemen,et al.  Making Hard Decisions: An Introduction to Decision Analysis , 1997 .

[7]  Dennis Proffitt,et al.  Cooperative bimanual action , 1997, CHI.

[8]  Paul Goodwin,et al.  Decision Analysis for Management Judgment , 1998 .

[9]  C Leake,et al.  Decision Analysis for Management Judgement (2nd Edn) , 1998, J. Oper. Res. Soc..

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[11]  Robert Dale,et al.  Building Natural Language Generation Systems: Figures , 2000 .

[12]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[13]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.

[14]  Jim Blythe,et al.  Visual exploration and incremental utility elicitation , 2002, AAAI/IAAI.

[15]  Actress Elizabeth Taylor,et al.  Experiments in Multidocument Summarization , 2002 .

[16]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[17]  Barbara Di Eugenio,et al.  The DIAG experiments: Natural Language Generation for Intelligent Tutoring Systems , 2002, INLG.

[18]  Claire Cardie,et al.  Detecting discrepancies in numeric estimates using multidocument hypertext summaries , 2002 .

[19]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[20]  Daniel Kudenko,et al.  Group Decision Making through Mediated Discussions , 2003, User Modeling.

[21]  S. Sénécal,et al.  The influence of online product recommendations on consumers' online choices , 2004 .

[22]  Michelle X. Zhou,et al.  An optimization-based approach to dynamic data content selection in intelligent multimedia interfaces , 2004, UIST '04.

[23]  Christopher D. Manning,et al.  Exploring Sentiment Summarization , 2004 .

[24]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[25]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[26]  Johanna D. Moore,et al.  Generating Tailored, Comparative Descriptions in Spoken Dialogue , 2004, FLAIRS Conference.

[27]  Marilyn A. Walker,et al.  Generation and evaluation of user tailored responses in multimodal dialogue , 2004 .

[28]  Liang Zhou,et al.  Multi-Document Biography Summarization , 2005, EMNLP.

[29]  Horacio Saggion,et al.  Multi-document summarization by cluster/prole relevance and redundancy removal , 2004 .

[30]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[31]  Xiaoquan Zhang,et al.  AIS Electronic Library (AISeL) , 2017 .

[32]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[33]  Giuseppe Carenini,et al.  Extracting knowledge from evaluative text , 2005, K-CAP '05.

[34]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[35]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[36]  Min-Yen Kan,et al.  Customization in a unified framework for summarizing medical literature , 2005, Artif. Intell. Medicine.

[37]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[38]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[39]  Johanna D. Moore,et al.  Generating and evaluating evaluative arguments , 2006, Artif. Intell..

[40]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[41]  Robert Dale,et al.  Building Natural Language Generation Systems (Studies in Natural Language Processing) , 2006 .

[42]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[43]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[44]  Pablo Gervás,et al.  User-model based personalized summarization , 2007, Inf. Process. Manag..

[45]  Jackie Chi Kit Cheung,et al.  Extractive vs. NLG-based Abstractive Summarization of Evaluative Text: The Effect of Corpus Controversiality , 2008, INLG.

[46]  David J. Weiss,et al.  SMARTS and SMARTER: Improved Simple Methods for Multiattribute Utility Measurement , 2008 .

[47]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[48]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[49]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .