Multi-Topic Multi-Document Summarizer

Current multi-document summarization systems can successfully extract summary sentences, however with many limitations including: low coverage, inaccurate extraction to important sentences, redundancy and poor coherence among the selected sentences. The present study introduces a new concept of centroid approach and reports new techniques for extracting summary sentences for multi-document. In both techniques keyphrases are used to weigh sentences and documents. The first summarization technique (Sen-Rich) prefers maximum richness sentences. While the second (Doc-Rich), prefers sentences from centroid document. To demonstrate the new summarization system application to extract summaries of Arabic documents we performed two experiments. First, we applied Rouge measure to compare the new techniques among systems presented at TAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S. Second, the system was applied to summarize multi-topic documents. Using human evaluators, the results show that Doc-Rich is the superior, where summary sentences characterized by extra coverage and more cohesion.

[1]  Tarek El-Shishtawy,et al.  An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes , 2012, ArXiv.

[2]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[3]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[4]  Naoaki Okazaki,et al.  Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[5]  Danushka Bollegala,et al.  IMPROVING COHERENCE IN MULI-DOCUMENT SUMMARIZATION THROUGH PROPER ORDERING OF SENTENCES , 2007 .

[6]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Martin Hassel Evaluation of automatic text summarizaiton : a practical implementation , 2004 .

[9]  Yulia Ledeneva,et al.  Experimenting with Maximal Frequent Sequences for Multi-Document Summarization , 2010 .

[10]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[11]  Dragomir R. Radev,et al.  NewsInEssence: summarizing online news topics , 2005, Commun. ACM.

[12]  Dianne P. O'Leary,et al.  Arabic/English Multi-document Summarization with CLASSY - The Past and the Future , 2008, CICLing.

[13]  Eduard Hovy,et al.  NEATS: A Multidocument Summarizer , 2001 .

[14]  Dragomir R. Radev,et al.  MEAD ReDUCs: Michigan at DUC 2003 , 2003 .

[15]  Wai Lam,et al.  Developing Infrastructure for the Evaluation of Single and Multi-document Summarization Systems in a Cross-lingual Environment , 2002, LREC.

[16]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .