Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach

Abstract Automatic text summarization methods are increasingly needed nowadays. Extractive multi-document summarization approaches aim to obtain the main content of a document collection at the same time that the redundant information is reduced. This can be addressed from an optimization point of view. There is a lack of multi-objective approaches applied in this context. In this paper, a Multi-Objective Artificial Bee Colony (MOABC) algorithm has been designed and implemented for this task. Experiments have been performed based on datasets from Document Understanding Conference (DUC) and model performances have been evaluated with Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, as is usual in this knowledge field. The results of the proposed approach show important improvements, i.e., in average, 31.09% (8.43%) and 18.63% (6.09%) of improvement in ROUGE-2 (ROUGE-L) have been obtained with respect to the best single-objective and multi-objective results in the scientific literature. Even more, the proposed approach has been proven to produce more concentrated ROUGE values when the algorithm execution is repeated (between 620.63% and 1333.95% of reduction in the relative dispersion, that is, between 6 and 13 times better), leading to more robust results.

[1]  Dervis Karaboga,et al.  A comprehensive survey: artificial bee colony (ABC) algorithm and applications , 2012, Artificial Intelligence Review.

[2]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[3]  Ramiz M. Aliguliyev,et al.  An Optimization Model and DPSO-EDA for Document Summarization , 2011 .

[4]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[5]  Lei Huang,et al.  Modeling Document Summarization as Multi-objective Optimization , 2010, 2010 Third International Symposium on Intelligent Information Technology and Security Informatics.

[6]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[7]  Mesut Gündüz,et al.  Artificial bee colony algorithm with variable search strategy for continuous optimization , 2015, Inf. Sci..

[8]  Xiaojun Wan,et al.  An Exploration of Document Impact on Graph-Based Multi-Document Summarization , 2008, EMNLP.

[9]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[10]  Miguel A. Vega-Rodríguez,et al.  NeuroK: A Collaborative e-Learning Platform based on Pedagogical Principles from Neuroscience , 2017, CSEDU.

[11]  Dervis Karaboga,et al.  AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION , 2005 .

[12]  Rasim M. Alguliyev,et al.  An unsupervised approach to generating generic summaries of documents , 2015, Appl. Soft Comput..

[13]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[14]  Hassan Mathkour,et al.  Selection criteria for text mining approaches , 2015, Comput. Hum. Behav..

[15]  Rasim M. Alguliyev,et al.  Sentence selection for generic document summarization using an adaptive differential evolution algorithm , 2011, Swarm Evol. Comput..

[16]  Rasim M. Alguliyev,et al.  Formulation of document summarization as a 0-1 nonlinear programming problem , 2013, Comput. Ind. Eng..

[17]  Ramiz M. Aliguliyev,et al.  CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI‐DOCUMENT SUMMARIZATION , 2010, Comput. Intell..

[18]  Ramiz M. Aliguliyev,et al.  QUADRATIC BOOLEAN PROGRAMMING MODEL AND BINARY DIFFERENTIAL EVOLUTION ALGORITHM FOR TEXT SUMMARIZATION , 2012 .

[19]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[20]  Rasim M. Alguliyev,et al.  MCMR: Maximum coverage and minimum redundant text summarization model , 2011, Expert Syst. Appl..

[21]  Rasim M. Alguliyev,et al.  GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy , 2012, Expert Syst. Appl..

[22]  Oguz Findik,et al.  A directed artificial bee colony algorithm , 2015, Appl. Soft Comput..

[23]  Rasim M. Alguliyev,et al.  AN OPTIMIZATION APPROACH TO AUTOMATIC GENERIC DOCUMENT SUMMARIZATION , 2013, Comput. Intell..

[24]  Rasim M. Alguliyev,et al.  CDDS: Constraint-driven document summarization models , 2013, Expert Syst. Appl..

[25]  Agus Zainal Arifin,et al.  COVERAGE, DIVERSITY, AND COHERENCE OPTIMIZATION FOR MULTI-DOCUMENT SUMMARIZATION , 2015 .

[26]  Rasim M. Alguliyev,et al.  pSum-SaDE: A Modified p-Median Problem and Self-Adaptive Differential Evolution Algorithm for Text Summarization , 2011, Appl. Comput. Intell. Soft Comput..

[27]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[28]  Jimmy J. Lin,et al.  Single-document and multi-document summarization techniques for email threads using sentence compression , 2008, Inf. Process. Manag..

[29]  Rasim M. Alguliyev,et al.  DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization , 2012, Knowl. Based Syst..

[30]  Enrique Herrera-Viedma,et al.  A New Memetic Algorithm for Multi-document Summarization Based on CHC Algorithm and Greedy Search , 2014, MICAI.