论文信息 - Multi-document Automatic Text Summarization Using Entropy Estimates

Multi-document Automatic Text Summarization Using Entropy Estimates

This paper describes a sentence ranking technique using entropy measures, in a multi-document unstructured text summarization application. The method is topic specific and makes use of a simple language independent training framework to calculate entropies of symbol units. The document set is summarized by assigning entropy-based scores to a reduced set of sentences obtained using a graph representation for sentence similarity. The performance is seen to be better than some of the common statistical techniques, when applied on the same data set. Commonly used measures like precision, recall and f-score have been modified and used as a new set of measures for comparing the performance of summarizers. The rationale behind such a modification is also presented. Experimental results are presented to illustrate the relevance of this method in cases where it is difficult to have language specific dictionaries, translators and document-summary pairs for training.

K. R. Ramakrishnan | N. Balakrishnan | G. Ravindra

[1] Eduard Hovy,et al. Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[2] Dragomir R. Radev,et al. Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[3] Chris D. Paice,et al. Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[4] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[5] Xin Liu,et al. Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[6] Graeme Hirst,et al. Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[7] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[8] Richard A. Harshman,et al. Indexing by latent semantic indexing analysis , 1990 .

[9] Breck Baldwin,et al. Dynamic Coreference-Based Summarization , 1998, EMNLP.

[10] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[11] Kathleen R. McKeown,et al. A description of the CIDR system as used for TDT-2 , 1999 .

[12] Chin-Yew Lin,et al. Automated Text Summarization , 2005, IJCNLP.