Multi-document Summarization Using Minimum Distortion

Document summarization plays an important role in the area of natural language processing and text mining. This paper proposes several novel information-theoretic models for multi-document summarization. They consider document summarization as a transmission system and assume that the best summary should have the minimum distortion. By defining a proper distortion measure and a new representation method, the combination of the last two models (the linear representation model and the facility location model) gains good experimental results on the DUC2002 and DUC2004 datasets. Moreover, we also indicate that the model has high interpretability and extensibility.

[1]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[2]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[3]  Naftali Tishby,et al.  The Information Bottleneck Revisited or How to Choose a Good Distortion Measure , 2007, 2007 IEEE International Symposium on Information Theory.

[4]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[5]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[6]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[7]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[8]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[9]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[10]  Chong Long,et al.  Multi-document Summarization by Information Distance , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[12]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[13]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[14]  Chong Long,et al.  Tsinghua University at the Summarization Track of TAC 2008 , 2008, TAC.

[15]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[16]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[17]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[18]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[19]  K. R. Ramakrishnan,et al.  Multi-document Automatic Text Summarization Using Entropy Estimates , 2004, SOFSEM.

[20]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[21]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[22]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[23]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[24]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[25]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[26]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[27]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[28]  Andrew McGregor,et al.  Finding Metric Structure in Information Theoretic Clustering , 2008, COLT.

[29]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[30]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[31]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[32]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[33]  Xiaojun Wan,et al.  Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[34]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.