Using only cross-document relationships for both generic and topic-focused multi-document summarizations

In recent years graph-ranking based algorithms have been proposed for single document summarization and generic multi-document summarization. The algorithms make use of the “votings” or “recommendations” between sentences to evaluate the importance of the sentences in the documents. This study aims to differentiate the cross-document and within-document relationships between sentences for generic multi-document summarization and adapt the graph-ranking based algorithm for topic-focused summarization. The contributions of this study are two-fold: (1) For generic multi-document summarization, we apply the graph-based ranking algorithm based on each kind of sentence relationship and explore their relative importance for summarization performance. (2) For topic-focused multi-document summarization, we propose to integrate the relevance of the sentences to the specified topic into the graph-ranking based method. Each individual kind of sentence relationship is also differentiated and investigated in the algorithm. Experimental results on DUC 2002–DUC 2005 data demonstrate the great importance of the cross-document relationships between sentences for both generic and topic-focused multi-document summarizations. Even the approach based only on the cross-document relationships can perform better than or at least as well as the approaches based on both kinds of relationships between sentences.

[1]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[2]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[3]  Atefeh Farzindar,et al.  CATS a topic-oriented multi-document summarization system at DUC 2005 , 2005 .

[4]  Eduard Hovy,et al.  A BE-based Multi-document Summarizer with Query Interpretation , 2005 .

[5]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[6]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[7]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[8]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[9]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  Kalina Bontcheva,et al.  Robust Generic and Query-based Summarization , 2003, EACL.

[12]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, ACL.

[13]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[16]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[17]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[18]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[19]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[20]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[21]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[22]  Stephen G. Pulman,et al.  Sentence ordering with manifold-based classification in multi-document summarization , 2006, EMNLP.

[23]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[24]  Qin Lu,et al.  Extractive Summarization using Inter- and Intra- Event Relevance , 2006, ACL.

[25]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[26]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[27]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[28]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[29]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[30]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[31]  John M. Conroy,et al.  Query-Based Multi-Document Summarization , 2005 .