论文信息 - Discovery of Topically Coherent Sentences for Extractive Summarization

Discovery of Topically Coherent Sentences for Extractive Summarization

Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence, we achieve a 44.1 ROUGE on the DUC-07 test set, roughly in the range of state-of-the-art supervised models.

Dilek Z. Hakkani-Tür | Asli Çelikyilmaz

[1] Xiaojun Wan,et al. Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[2] Michael Gamon,et al. The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[3] Sanda M. Harabagiu,et al. Satisfying information needs with multi-document summaries , 2007, Inf. Process. Manag..

[4] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.

[5] Dragomir R. Radev,et al. LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[6] Juan-Zi Li,et al. Query-Focused Summarization by Combining Topic Model and Affinity Propagation , 2009, APWeb/WAIM.

[7] Hanna M. Wallach,et al. Topic modeling: beyond bag-of-words , 2006, ICML.

[8] Regina Barzilay,et al. Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[9] Wei Li,et al. Nonparametric Bayes Pachinko Allocation , 2007, UAI.

[10] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11] Dianne P. O'Leary,et al. Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.