Multi-Document Summarization Using Graph-Based Iterative Ranking Algorithms and Information Theoretical Distortion Measures

Text summarization is an important field in the area of natural language processing and text mining. This paper proposes an extraction-based model which uses graph-based and information theoretic concepts for multi-document summarization. Our method constructs a directed weighted graph from the original text by adding a vertex for each sentence, and compute a weighted edge between sentences which is based on distortion measures. In this paper we proposed a combination of these two models by representing the input as a graph, using distortion measures as the weight function and a ranking algorithm. Finally, a ranking algorithm is applied to identify the most important sentences to be included in the summary. By defining a proper distortion measure and ranking algorithm, this model gains promising results on the DUC2002 which is a well known real world data set. The results and ROUGE-1 scores of our model is fairly close to other successful models.

[1]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[3]  Tat-Seng Chua,et al.  Document concept lattice for text understanding and summarization , 2007, Inf. Process. Manag..

[4]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[5]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[6]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[9]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[10]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  Xiaojun Wan,et al.  Multi-document Summarization Using Minimum Distortion , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Martin Hassel Resource Lean and Portable Automatic Text Summarization , 2007 .

[14]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.