Variable Length-Based Genetic Representation to Automatically Evolve Wrappers

The Web has been the star service on the Internet, however the outsized information available and its decentralized nature has originated an intrinsic difficulty to locate, extract and compose information. An automatic approach is required to handle with this huge amount of data. In this paper we present a machine learning algorithm based on Genetic Algorithms which generates a set of complex wrappers, able to extract information from theWeb. The paper presents the experimental evaluation of these wrappers over a set of basic data sets.

[1]  Annie S. Wu,et al.  Putting More Genetics into Genetic Algorithms , 1998, Evolutionary Computation.

[2]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[3]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[4]  Annie S. Wu,et al.  Genome Length as an Evolutionary Self-adaptation , 1998, PPSN.

[5]  P.R.J. Asveld,et al.  Review of J.G.Brookshear: Theory of computation - Formal languages, automata and complexity (1989), Benjamin/Cummings, Redwood city, CA , 1991 .

[6]  Frederick E. Petry,et al.  Regular language induction with genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[7]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[8]  María Dolores Rodríguez-Moreno,et al.  SEARCHY: A metasearch engine for heterogeneous sources in distributed environments , 2005, Dublin Core Conference.

[9]  Thomas Bäck,et al.  Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.

[10]  Rajendra Akerkar,et al.  Semantic Wrappers for Semi-Structured Data Extraction , 2008 .

[11]  María Dolores Rodríguez-Moreno,et al.  Automatic Web Data Extraction Based on Genetic Algorithms and Regular Expressions , 2009, Data Mining and Multi-agent Integration.

[12]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[13]  Dominique Chu,et al.  Crossover operators to control size growth in linear GP and variable length GAs , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14]  J. Glenn Brookshear,et al.  Theory of Computation: Formal Languages, Automata, and Complexity , 1989 .

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  Kevin Warwick,et al.  Synapsing Variable-Length Crossover: Meaningful Crossover for Variable-Length Genomes , 2007, IEEE Transactions on Evolutionary Computation.

[17]  David F. Barrero,et al.  Semantic Wrappers for Semi-Structured Data Extraction 1 , 2008 .