Extracting easy to understand summary using differential evolution algorithm

Abstract This paper describes an optimization method based on differential evolution algorithm and its novel application to extract easy to understand summary for improving text readability. The idea is to improve the readability of the given text for reading difficulties using assistive summary. In order to extract easy to understand summary from the given text, an improved differential evolution algorithm is proposed. A new chromosome representation that considers ordering and similarity for extracting cohesive summary. Also a modified crossover operator and mutation operator are designed to generate potential offspring. The application of differential evolution algorithm for maximizing the average similarity and informative score in the candidate summary sentences is proposed. We applied the proposed algorithm in a corpus of educational text from ESL text books and in graded text. The results show that the summary generated using Differential Evolution algorithm performs better in accuracy, readability and lexical cohesion than existing techniques. The task based evaluation done by target audience also favors the significant effect of assistive summary in improving readability.

[1]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[2]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[3]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[4]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[5]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  R. Gunning The Technique of Clear Writing. , 1968 .

[10]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[11]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[12]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[13]  Ponnuthurai N. Suganthan,et al.  An Adaptive Differential Evolution Algorithm With Novel Mutation and Crossover Strategies for Global Numerical Optimization , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Rada Mihalcea,et al.  Explorations in Automatic Book Summarization , 2007, EMNLP.

[15]  Mary Ellen Okurowski,et al.  Trainable, Scalable Summarization Using Robust NLP and Machine Learning , 1998, ACL.

[16]  R. Gunning The Fog Index After Twenty Years , 1969 .

[17]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[18]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[19]  John M. Conroy,et al.  Back to Basics: CLASSY 2006 , 2006 .

[20]  R H Maki,et al.  Metacomprehension of text material. , 1984, Journal of experimental psychology. Learning, memory, and cognition.

[21]  Ramiz M. Aliguliyev AUTOMATIC DOCUMENT SUMMARIZATION BY SENTENCE EXTRACTION , 2007 .

[22]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[23]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[24]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[25]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[26]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[27]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[28]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[29]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[30]  Ben Jann,et al.  Making Regression Tables from Stored Estimates , 2005 .

[31]  M Gajria,et al.  The Effects of Summarization Instruction on Text Comprehension of Students with Learning Disabilities , 1992, Exceptional children.

[32]  Godfrey C. Onwubolu,et al.  Scheduling flow shops using differential evolution algorithm , 2006, Eur. J. Oper. Res..

[33]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[34]  Richard Williams,et al.  Review of Regression Models for Categorical Dependent Variables Using Stata, Second Edition, by Long and Freese , 2006 .

[35]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[36]  Rada Mihalcea,et al.  Language Independent Extractive Summarization , 2005, ACL.

[37]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[38]  D. Ray Reutzel,et al.  An evaluation of two approaches for teaching reading comprehension strategies in the primary years using science information texts , 2005 .

[39]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[40]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[41]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[42]  Donald Davendra,et al.  Differential Evolution for Permutation—Based Combinatorial Problems , 2009 .

[43]  Anne W. Graves,et al.  Effects of direct instruction and metacomprehension training on finding main ideas. , 1986 .

[44]  Fuji Ren,et al.  GA, MR, FFNN, PNN and GMM based models for automatic text summarization , 2009, Comput. Speech Lang..

[45]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[46]  Lijun Feng,et al.  Cognitively Motivated Features for Readability Assessment , 2009, EACL.