A Study on the Use of Genetic Programming for Automatic Text Summarization

Text Summarization is the process of identifying and extracting the most vital information in a document. It has been seen as an effective method for dealing with increasing amount of information on the Internet nowadays. In this paper, we present an application of Genetic Programming to the problem of Automatic Text Summarization. Genetic Programming was used to evolve the function that ranks the sentences in a document based on their importance. The summary was extracted by selecting the sentences that have the highest rankings. The experiment was conducted on a number of Vietnamese news documents. The result showed that the summaries created by Genetic Programming are better than those created by a number of statistic based methods and even by human (non-experts).

[1]  Barbara Di Eugenio,et al.  Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization , 2004, COLING.

[2]  Hô Tuòng Vinh,et al.  A Hybrid Approach to Word Segmentation of Vietnamese Texts , 2008, LATA.

[3]  Laurence Hirsch,et al.  Evolving Text Classifiers with Genetic Programming , 2004, EuroGP.

[4]  Mohammad-R. Akbarzadeh-T,et al.  Automatic Text Summarization Using Hybrid Fuzzy GA-GP , 2006 .

[5]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[6]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[7]  Peter A. Whigham,et al.  Grammatically-based Genetic Programming , 1995 .

[8]  Michael O'Neill,et al.  Improving the Generalisation Ability of Genetic Programming with Semantic Similarity based Crossover , 2010, EuroGP.

[9]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[10]  Iadh Ounis,et al.  Evaluating Summarisation Technologies: A Task Oriented Approach , 2001, NDDL.

[11]  Laurence Hirsch,et al.  Evolving Rules for Document Classification , 2005, EuroGP.

[12]  Youngjoong Ko,et al.  Automatic Text Summarization Using Two-Step Sentence Extraction , 2004, AIRS.

[13]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[14]  Arman Kiani,et al.  Automatic Text Summarization Using Hybrid Fuzzy GA-GP , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[15]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[16]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[17]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[18]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[19]  Youngjoong Ko,et al.  Topic Keyword Identification for Text Summarization Using Lexical Clustering , 2003 .

[20]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.