A Novel Biased Diversity Ranking Model for Query-Oriented Multi-Document Summarization

Query-oriented multi-document summarization (QMDS) attempts to generate a concise piece of text byextracting sentences from a target document collection, with the aim of not only conveying the key content of that corpus, also, satisfying the information needs expressed by that query. Due to its great applicable value, QMDS has been intensively studied in recent decades. Three properties are supposed crucial for a good summary, i.e., relevance, prestige and low redundancy (orso-called diversity). Unfortunately, most existing work either disregarded the concern of diversity, or handled it with non-optimized heuristics, usually based on greedy sentences election. Inspired by the manifold-ranking process, which deals with query-biased prestige, and DivRank algorithm which captures query-independent diversity ranking, in this paper, we propose a novel biased diversity ranking model, named ManifoldDivRank, for query-sensitive summarization tasks. The top-ranked sentences discovered by our algorithm not only enjoy query-oriented high prestige, more importantly, they are dissimilar with each other. Experimental results on DUC2005and DUC2006 benchmark data sets demonstrate the effectiveness of our proposal.

[1]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[2]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Hua Li,et al.  Improving web search results using affinity graph , 2005, SIGIR '05.

[5]  Fan Zhang,et al.  Query-focused multi-document summarization based on query-sensitive feature space , 2012, CIKM.

[6]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[7]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[8]  Wenjie Li,et al.  Developing learning strategies for topic-based summarization , 2007, CIKM '07.

[9]  Sadid A. Hasan,et al.  Query-focused multi-document summarization: automatic data annotations and supervised learning approaches , 2011, Natural Language Engineering.

[10]  Xiaoyan Zhu,et al.  A Comparative Study on Ranking and Selection Strategies for Multi-Document Summarization , 2010, COLING.

[11]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[12]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[13]  Xuan Li,et al.  Exploiting novelty, coverage and balance for topic-focused multi-document summarization , 2010, CIKM '10.

[14]  Bernhard Schölkopf,et al.  Joint Kernel Maps , 2005, IWANN.

[15]  Tao Li,et al.  Learning to Rank for Query-Focused Multi-document Summarization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[16]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.