User-Focused Multi-Document Summarization with Paragraph Clustering and Sentence-Type Filtering

Applying document clustering techniques to multidocument summarization is a challenging problem, mostly because of the redundancy that exists in multiple sources. We compare several document clustering techniques for multi-document summarization in the NTCIR-4 TSC test collection. We conducted an experiment to evaluate the effectiveness of reducing redundancy in the production of summaries. From the results, we draw conclusions regarding the nature of the multi-document summarization with respect to redundancy reduction strategies.

[1]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[2]  Manabu Okumura,et al.  Text summarization challenge 2: text summarization evaluation at NTCIR workshop 3 , 2001, HLT-NAACL 2003.

[3]  Marie-Francine Moens,et al.  The use of topic segmentation for automatic summarization , 2002, ACL 2002.

[4]  Marie-Francine Moens,et al.  Automatic Indexing and Abstracting of Document Texts , 2000, Computational Linguistics.

[5]  Peter Willett,et al.  Hierarchic document classification using Ward's clustering method , 1986, SIGIR '86.

[6]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[7]  Noriko Kando,et al.  Compact Summarization for Mobile Phones , 2003, Mobile HCI Workshop on Mobile and Ubiquitous Information Access.

[8]  Tomek Strzalkowski,et al.  Evaluating Summaries for Multiple Documents in an Interactive Environment , 2000, LREC.

[9]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[10]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[11]  Endre Boros A Clustering Based Approach to Creating Multi - Document Summaries , 2001 .

[12]  Manabu Okumura,et al.  Text Summarization Challenge 2 text summarization evaluation at NTCIR workshop 3 , 2004, SIGF.

[13]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[14]  Proceedings of the Third NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, NTCIR-3, Tokyo, Japan, October 8-10, 2002 , 2003, NTCIR.

[15]  Padmini Srinivasan,et al.  Categorization of Sentence Types in Medical Abstracts , 2003, AMIA.

[16]  Noriko Kando,et al.  Overview of the Third NTCIR Workshop , 2002, NTCIR.

[17]  Yohei Seki,et al.  Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles , 2002, NTCIR.

[18]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..