Multi-Document Viewpoint Summarization Focused on Facts, Opinion and Knowledge

An interactive information retrieval system that provides different types of summaries of retrieved documents according to each user’s information needs, situation, or purpose of search can be effective for understanding document content. The purpose of this study is to build a multi-document summarizer, “Viewpoint Summarizer With Interactive clustering on Multidocuments (v-SWIM)”, which produces summaries according to such viewpoints. We tested its effectiveness on a new test collection, ViewSumm30, which contains human-made reference summaries of three different summary types for each of the 30 document sets. Once a set of documents on a topic (e.g., documents retrieved by a search engine) is provided to v-SWIM, it returns a list of topics discussed in the given document set, so that the user can select a topic or topics of interest as well as the summary type, such as fact-reporting, opinion-oriented or knowledge-focused, and produces a summary from the viewpoints of the topics and summary type selected by the user. We assume that sentence types and document genres are related to the types of information included in the source documents and are useful for selecting appropriate information for each of the summary types. “Sentence type” defines the type of information in a sentence. “Document genre” defines the type of information in a document. The results of the experiments showed that the proposed system using automatically identified sentence types and document genres of the source documents improved the coverage of the system-produced fact-reporting, opinion-oriented, and knowledge-focused summaries, 13.14%, 34.23%, and 15.89%, respectively, compared with our baseline system which did not differentiate sentence types or document genres.

[1]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[2]  Hinrich Schütze,et al.  Automatic Detection of Text Genre , 1997, ACL.

[3]  Jussi Karlgren,et al.  Recognizing Text Genres With Simple Metrics Using Discriminant Analysis , 1994, COLING.

[4]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[5]  Jeffrey Pomerantz Question taxonomies for digital reference , 2004, SIGF.

[6]  Barry Smyth,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[7]  J. Simpson,et al.  The Oxford English Dictionary , 1884 .

[8]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[9]  Jeffrey Pomerantz Question types in digital reference: an evaluation of question taxonomies , 2002, JCDL '02.

[10]  Manabu Okumura,et al.  Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3 , 2001, HLT-NAACL 2003.

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Tomek Strzalkowski,et al.  Evaluating Summaries for Multiple Documents in an Interactive Environment , 2000, LREC.

[15]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[16]  Noriko Kando,et al.  Compact Summarization for Mobile Phones , 2003, Mobile HCI Workshop on Mobile and Ubiquitous Information Access.

[17]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[18]  Noriko Kando,et al.  User-Focused Multi-Document Summarization with Paragraph Clustering and Sentence-Type Filtering , 2004, NTCIR.

[19]  Jinxi Xu,et al.  Evaluation of an extraction-based approach to answering definitional questions , 2004, SIGIR '04.

[20]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[21]  Manabu Okumura,et al.  Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3 , 2004, SIGF.

[22]  Noriko Kando Overview of the Fourth NTCIR Workshop , 2004, NTCIR.

[23]  C. Bazerman Speech Acts, Genres, and Activity Systems: How Texts Organize Activity and People , 2003 .

[24]  Barbora Hladká,et al.  Review: Corpus Linguistics. Investigating Language Structure and Use.Cambridge Approaches, by Douglas Biber, Susan Conrad and Randi Reppen , 2001, Prague Bull. Math. Linguistics.

[25]  Paul Over,et al.  The Effects of Human Variation in DUC Summarization Evaluation , 2004 .

[26]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[27]  Efstathios Stamatatos,et al.  Text Genre Detection Using Common Word Frequencies , 2000, COLING.

[28]  Bernard McKenna,et al.  Review of: What writing does and how it does it : an introduction to analyzing texts and textual practices, edited by Charles Bazerman and Paul Prior (Mahwah, N.J. : Lawrence Erlbaum Associates, 2004) , 2005 .

[29]  Claire Cardie,et al.  Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering , 2003, New Directions in Question Answering.

[30]  Fabio Crestani,et al.  Spoken versus written queries for mobile information access , 2004 .

[31]  Pia Borlund,et al.  The concept of relevance in IR , 2003, J. Assoc. Inf. Sci. Technol..

[32]  Paul A Prior,et al.  What Writing Does and How It Does It: An Introduction to Analyzing Texts and Textual Practices , 2003 .

[33]  Manuel J. Maña López,et al.  Multidocument summarization: An added value to clustering in interactive retrieval , 2004, TOIS.

[34]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[35]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[36]  Marie-Francine Moens,et al.  K.U.Leuven summarization system - DUC 2003 , 2003 .