CorrRank: Update Summarization Based on Topic Correlation Analysis

In this paper, we propose a novel update summarization framework based on topic correlation analysis. The topics are first extracted from the two document sets provided in the task of update summarization by means of Latent Dirichlet Allocation (LDA) topic model. Then, the correlation between the new topics and the old topics are identified, based on which we further defined four categories of topic evolution patterns to capture the topic shift between the two document collections. We develop a new sentence ranking algorithm, i.e. CorrRank, which fully incorporates the topic evolution in the process of sentence ranking and sentence selection in update summarization. We choose the DUC 2008 and 2009 query-oriented multi-document update summarization tasks to examine the proposed model. Experimental results show the effectiveness of the LDA topic correlation analysis based update summarization framework.

[1]  Jen-Tzung Chien,et al.  Latent Dirichlet learning for document summarization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[3]  Juan-Zi Li,et al.  Query-Focused Summarization by Combining Topic Model and Affinity Propagation , 2009, APWeb/WAIM.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[6]  Balaraman Ravindran,et al.  Latent dirichlet allocation based multi-document summarization , 2008, AND '08.

[7]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[8]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[9]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[10]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[11]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[12]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[13]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[14]  Shourya Roy,et al.  Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data , 2009 .