A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence

Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization. However, many interesting optimization objectives are neither submodular nor factorizable into an integer linear program. We address this issue and present a general optimization framework where any function of input documents and a system summary can be plugged in. Our framework includes two kinds of summarizers – one based on genetic algorithms, the other using a swarm intelligence approach. In our experimental evaluation, we investigate the optimization of two information-theoretic summary evaluation metrics and find that our framework yields competitive results compared to several strong summarization baselines. Our comparative analysis of the genetic and swarm summarizers reveals interesting complementary properties.

[1]  Rasim M. Alguliyev,et al.  Multiple documents summarization based on evolutionary optimization algorithm , 2013, Expert Syst. Appl..

[2]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.

[3]  Thorsten Joachims,et al.  Large-Margin Learning of Submodular Summarization Models , 2012, EACL.

[4]  Michael N. Vrahatis,et al.  Recent approaches to global optimization problems through Particle Swarm Optimization , 2002, Natural Computing.

[5]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[6]  Jing Wang,et al.  Swarm Intelligence in Cellular Robotic Systems , 1993 .

[7]  Ramiz M. Aliguliyev,et al.  A new sentence similarity measure and sentence based extractive technique for automatic text summarization , 2009, Expert Syst. Appl..

[8]  Judith Eckle-Kohler,et al.  Optimizing an Approximation of ROUGE - a Problem-Reduction Approach to Extractive Multi-Document Summarization , 2016, ACL.

[9]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[10]  Wei Song,et al.  Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization , 2011, Expert Syst. Appl..

[11]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[12]  Dervis Karaboga,et al.  A comprehensive survey: artificial bee colony (ABC) algorithm and applications , 2012, Artificial Intelligence Review.

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Dong-Hong Ji,et al.  MSBGA: A Multi-Document Summarization System Based on Genetic Algorithm , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[15]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[16]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[17]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[18]  Pushmeet Kohli,et al.  Tractability: Practical Approaches to Hard Problems , 2013 .

[19]  Luca Maria Gambardella,et al.  A survey on metaheuristics for stochastic combinatorial optimization , 2009, Natural Computing.

[20]  Jianfeng Gao,et al.  An Information-Theoretic Approach to Automatic Evaluation of Summaries , 2006, NAACL.

[21]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[22]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[23]  Hui Lin,et al.  A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization , 2014, LREC.

[24]  Mark Last,et al.  A New Approach to Improving Multilingual Summarization Using a Genetic Algorithm , 2010, ACL.

[25]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[26]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.

[27]  Kumaresh Nandhini,et al.  Use of Genetic Algorithm for Cohesive Summary Extraction to Assist Reading Difficulties , 2013, Appl. Comput. Intell. Soft Comput..

[28]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[29]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[30]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[31]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[32]  Benoît Favre,et al.  Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions , 2015, EMNLP.

[33]  L J Fogel,et al.  Intelligent decision making through a simulation of evolution. , 1966, Behavioral science.