Calculating the Upper Bounds for Multi-Document Summarization using Genetic Algorithms

Over the last years, several Multi-Document Summarization (MDS) methods have been presented in Document Understanding Conference (DUC) workshops. Since DUC01, several methods have been presented in approximately 268 publications of the state-of-the-art, that have allowed the continuous improvement of MDS, however in most works the upper bounds were unknowns. Recently, some works have been focused to calculate the best sentence combinations of a set of documents and in previous works we have been calculated the significance for single-document summarization task in DUC01 and DUC02 datasets. However, for MDS task has not performed an analysis of significance to rank the best multi-document summarization methods. In this paper, we propose a method based on Genetic Algorithms  for calculating the best sentence combinations of DUC01 and DUC02 datasets in MDS through a meta-document representation. Moreover, we have calculated three heuristics mentioned in several works of state-of-the-art to rank the most recent MDS methods, through the calculus of upper bounds and lower bounds.

[1]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[2]  Naomie Salim,et al.  GENETIC ALGORITHM BASED SENTENCE EXTRACTION FOR TEXT SUMMARIZATION , 2011 .

[3]  Rene Arnulfo Garcia Hernandez,et al.  Generación automática de resúmenes - Retos, propuestas y experimentos , 2017 .

[4]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[5]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[6]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[7]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[8]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[9]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.

[10]  Peter Norvig,et al.  Inteligencia Artificial: un Enfoque Moderno , 2013 .

[11]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[12]  Yogesh Kumar Meena,et al.  Evolutionary Algorithms for Extractive Automatic Text Summarization , 2015 .

[13]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[16]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[17]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[18]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[19]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[20]  Chris H. Q. Ding,et al.  Weighted Feature Subset Non-negative Matrix Factorization and Its Applications to Document Understanding , 2010, 2010 IEEE International Conference on Data Mining.

[21]  Stephan Oepen,et al.  Sentence Boundary Detection: A Long Solved Problem? , 2012, COLING.

[22]  Rakesh M. Verma,et al.  Extractive Summarization: Limits, Compression, Generalized Model and Heuristics , 2017, Computación y Sistemas.

[23]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[24]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[25]  Z. Li,et al.  How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds , 2017, Expert Syst. Appl..

[26]  Yulia Ledeneva,et al.  Calculating the significance of automatic extractive text summarization using a genetic algorithm , 2018, J. Intell. Fuzzy Syst..

[27]  Eric SanJuan,et al.  Summary Evaluation with and without References , 2010, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[28]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[29]  Enrique Alfonseca,et al.  Generating Extracts with Genetic Algorithms , 2003, ECIR.

[30]  K. Srinathan,et al.  Using Graph Based Mapping of Co-occurring Words and Closeness Centrality Score for Summarization Evaluation , 2012, CICLing.

[31]  Elizabeth León Guzman,et al.  Extractive single-document summarization based on genetic operators and guided local search , 2014, Expert Syst. Appl..

[32]  Enrique Herrera-Viedma,et al.  A New Memetic Algorithm for Multi-document Summarization Based on CHC Algorithm and Greedy Search , 2014, MICAI.

[33]  Alexander Gelbukh,et al.  Comparing Commercial Tools and State-of-the-Art Methods for Generating Text Summaries , 2009, 2009 Eighth Mexican International Conference on Artificial Intelligence.

[34]  Yulia Ledeneva,et al.  Single Extractive Text Summarization Based on a Genetic Algorithm , 2013, MCPR.

[35]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[36]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[37]  Elena Lloret,et al.  Quantifying the Limits and Success of Extractive Summarization Systems Across Domains , 2010, HLT-NAACL.

[38]  Rafael Dueire Lins,et al.  A multi-document summarization system based on statistics and linguistic treatment , 2014, Expert Syst. Appl..

[39]  Mohamed Abdel Fattah A hybrid machine learning model for multi-document summarization , 2013, Applied Intelligence.

[40]  Alexander F. Gelbukh,et al.  Terms Derived from Frequent Sequences for Extractive Text Summarization , 2008, CICLing.

[41]  Manuel J. Maña López,et al.  Generación automática de resümenes personalizados , 2001, Proces. del Leng. Natural.

[42]  GambhirMahak,et al.  Recent automatic text summarization techniques , 2017 .

[43]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..