Query-focused multi-document summarization using hypergraph-based ranking

We propose a novel hybrid method to capture group relation of sentences.We cluster sentences with a KL-divergence based on word-topic distribution.We proposed a vertex reinforcement random walk process in a hypergraph model.The process simultaneously consider the query similarity, the centrality and the diversity of sentences.We implement our framework and verify improvement over appropriate baselines. General graph random walk has been successfully applied in multi-document summarization, but it has some limitations to process documents by this way. In this paper, we propose a novel hypergraph based vertex-reinforced random walk framework for multi-document summarization. The framework first exploits the Hierarchical Dirichlet Process (HDP) topic model to learn a word-topic probability distribution in sentences. Then the hypergraph is used to capture both cluster relationship based on the word-topic probability distribution and pairwise similarity among sentences. Finally, a time-variant random walk algorithm for hypergraphs is developed to rank sentences which ensures sentence diversity by vertex-reinforcement in summaries. Experimental results on the public available dataset demonstrate the effectiveness of our framework.

[1]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[2]  Claude Berge,et al.  Hypergraphs - combinatorics of finite sets , 1989, North-Holland mathematical library.

[3]  Sujian Li,et al.  Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning , 2012, COLING.

[4]  Xuanjing Huang,et al.  Using query expansion in graph-based approach for query-focused multi-document summarization , 2009, Inf. Process. Manag..

[5]  Tao Li,et al.  Topic aspect analysis for multi-document summarization , 2010, CIKM '10.

[6]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[7]  Furu Wei,et al.  Exploring hypergraph-based semi-supervised ranking for query-oriented summarization , 2013, Inf. Sci..

[8]  Igor Kononenko,et al.  Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization , 2014, Expert Syst. Appl..

[9]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[10]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[13]  Xueqi Cheng,et al.  Supervised Lazy Random Walk for Topic-Focused Multi-document Summarization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[14]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[15]  Qin Lu,et al.  A Study on Position Information in Document Summarization , 2010, COLING.

[16]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[17]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[18]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[19]  Fan Zhang,et al.  SentTopic-MultiRank: a Novel Ranking Model for Multi-Document Summarization , 2012, COLING.

[20]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  JiDonghong,et al.  Query-focused multi-document summarization using hypergraph-based ranking , 2016 .

[24]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[25]  Shuzhi Sam Ge,et al.  Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling , 2012, Inf. Process. Manag..

[26]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[27]  Furu Wei,et al.  HyperSum: hypergraph based semi-supervised sentence ranking for query-oriented summarization , 2009, CIKM.

[28]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[29]  Wenjie Li,et al.  Mutually Reinforced Manifold-Ranking Based Relevance Propagation Model for Query-Focused Multi-Document Summarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Xiaojun Wan SUBTOPIC‐BASED MULTIMODALITY RANKING FOR TOPIC‐FOCUSED MULTIDOCUMENT SUMMARIZATION , 2013, Comput. Intell..

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Furu Wei,et al.  A document-sensitive graph model for multi-document summarization , 2010, Knowledge and Information Systems.

[33]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..