Filtrage de textes dans le but de produire un résumé de documents multiples

In the context of DUC Conference (Document Understanding Conference) , we have developed an automatic summarization system of multiple documents which is based on the extraction of the key sentences. The proposed method uses a genetic algorithm which combines the sentences of the source documents in order to produce extracts. These extracts will be crossed and mutated in order to generate new extracts. The examination of the results obtained in the two sessions DUC' 04 and DUC' 07 showed a significant variation of the system performance. Indeed, a phenomenon of genetic drift is observed when the system processes big size texts (as an input). In order to solve this problem, we propose to integrate an additional module of sentence filtering to reduce the number of sentences in the input. This filtering is based on the concept of predominance between sentences which allows to eliminate a great number of sentences from the initial pool.

[1]  Tatsunori Mori,et al.  Multi-answer-focused multi-document summarization using a question-answering engine , 2004, COLING 2004.

[2]  Abdelmajid Ben Hamadou,et al.  Automatic Text Summarization of Scientific Articles Based on Classification of Extract's Population , 2003, CICLing.

[3]  Kathleen R. McKeown,et al.  Generating natural language summaries from multiple on-line sources , 1998 .

[4]  Judith L. Klavans,et al.  A Flexible Clustering Tool for Summarization , 2001 .

[5]  C. Fellbaum An Electronic Lexical Database , 1998 .

[6]  Dong-Hong Ji,et al.  Genetic Algorithm Based Multi-document Summarization , 2006, PRICAI.

[7]  Tatsunori Mori,et al.  Multi-Answer-Focused Multi-Document Summarization Using a Question-Answering Engine , 2004, COLING.

[8]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[11]  Eduard Hovy,et al.  Automated multi-document summarization in NeATS , 2002 .

[12]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[13]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[14]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[15]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[16]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[17]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[18]  Horacio Saggion,et al.  Multi-document summarization by cluster/prole relevance and redundancy removal , 2004 .

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[21]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.