论文信息 - Combining a Multi-document Summarization System with a Genetic Algorithm

Combining a Multi-document Summarization System with a Genetic Algorithm

In this paper, we present a combination of a multi-document summarization system with a genetic algorithm. We first introduce a novel approach for automatic summarization. CBSEAS, the system which implements this approach, integrates a new method to detect redundancy at its very core in order to produce summaries with a good informational diversity. However, the evaluation of our system at TAC 2008 --Text Analysis Conference-- revealed that system adaptation to a specific domain is fundamental to obtain summaries of an acceptable quality. The second part of this paper is dedicated to a genetic algorithm which aims to adapt our system to specific domains. We present its evaluation by TAC 2009 on a newswire articles summarization task and show that this optimization is having a great influence on both human and automatic evaluations.

Christophe Rodrigues | Aurélien Bossard

[1] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2] Wei-Pang Yang,et al. Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis , 2002, ICADL.

[3] H. P. Edmundson,et al. Automatic abstracting and indexing—survey and recommendations , 1961, CACM.

[4] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5] Jianfeng Gao,et al. An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[6] Hoa Trang Dang,et al. Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[7] Mary Ellen Okurowski,et al. Trainable, Scalable Summarization Using Robust NLP and Machine Learning , 1998, ACL.

[8] Michel Généreux,et al. Description of the LIPN Systems at TAC 2008: Summarizing Information and Opinions , 2008, TAC.

[9] Miles Osborne,et al. Using maximum entropy for sentence extraction , 2002, ACL 2002.

[10] David W. Conrath,et al. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[11] Dragomir R. Radev,et al. LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[12] Michael Gamon,et al. The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[13] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[14] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[15] Wai Lam,et al. MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[16] Ryan T. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[17] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.