Survey of genetic algorithms and genetic programming

This paper provides an introduction to genetic algorithms and genetic programming and lists sources of additional information, including books and conferences as well as e-mail lists and software that is available over the Internet. 1 . GENETIC ALGORITHMS John Holland's pioneering book Adaptation in Natural and Artificial Systems (1975, 1992) showed how the evolutionary process can be applied to solve a wide variety of problems using a highly parallel technique that is now called the genetic algorithm. The genetic algorithm (GA) transforms apopulation (set) of individual objects, each with an associated fitness value, into a new generation of the population using the Darwinian principle of reproduction and survival of the fittest and analogs of naturally occurring genetic operations such as crossover (sexual recombination) and mutation. Each individual in the population represents a possible solution to a given problem. The genetic algorithm attempts to find a very good (or best) solution to the problem by genetically breeding the population of individuals over a series of generations. Before applying the genetic algorithm to the problem, the user designs an artificial chromosome of a certain fixed size and then defines a mapping (encoding) between the points in the search space of the problem and instances of the artificial chromosome. For example, in applying the genetic algorithm to a multidimensional optimization problem (where the goal is to find the global optimum of an unknown multidimensional function), the artificial chromosome may be a linear character string (modeled directly after the linear string of information found in DNA). A specific location (a gene) along this artificial chromosome is associated with each of the variables of the problem. Character(s) appearing at a particular location along the chromosome denote the value of a particular variable (Le., the gene value or allele). Each individual in the population has a fitness value (which, for a multidimensional optimization problem, is the value of the unknown function). The genetic algorithm then manipulates a population of such artificial chromosomes (usually starting from a randomly-created initial population of strings) using the operations of reproduction, crossover, and mutation. Individuals are probabilistically selected to participate in these genetic operations based on their fitness. The goal of the genetic algorithm in a multidimensional optimization problem is to find an artificial chromosome which, when decoded and mapped back into the search space of the problem, corresponds to a globally optimum (or near-optimum) point in the original search space of the problem. In preparing to use the conventional genetic algorithm operating on fixed-length character strings to solve a problem, the user must (1) determine the representation scheme, (2) determine the fitness measure, (3) determine the parameters and variables for controlling the algorithm, and (4) determine a way of designating the result and a criterion for terminating a run. In the conventional genetic algorithm, the individuals in the population are usually fixed-length character strings patterned after chromosome strings. Thus, specification of the representation scheme in the conventional genetic algorithm starts with a selection of the string length L and the alphabet size K . Often the alphabet is binary, so K equals 2. The most important part of the representation scheme is the mapping that expresses each possible point in the search space of the problem as a fixed-length character string (i.e., as a chromosome) and each chromosome as a point in the search space of the problem. Selecting a representation scheme that facilitates solution of the problem by the genetic algorithm often requires considerable insight into the problem and good judgment. The evolutionary process is driven by the fitness measure. The fitness measure assigns a fitness value to each possible fixed-length character string in the population. The primary parameters for controlling the genetic algorithm are the population size, M, and the maximum number of generations to be run, G. Populations can consist of hundreds, thousands, tens of thousands or more individuals. There can be dozens, hundreds, thousands, or more generations in a run of the genetic algorithm. Each run of the genetic algorithm requires specification of a termination criterion for deciding when to terminate a run and a method of result designation, One frequently used method of result designation for a run of the genetic algorithm is to designate the best individual obtained in any generation of the population during the run (i.e., the best-so-far individual) as the result of the run. Once the four preparatory steps for setting up the genetic algorithm have been completed, the genetic algorithm can be run. The evolutionary process described above indicates how a globally optimum combination of alleles (gene values) within a fixed-size chromosome can be evolved. The three steps in executing the genetic algorithm operating on fixed-length character strings are as follows: (1) Randomly create an initial population of individual fixedlength character strings. ISBN# 0-7803-2636-9 589 (2) Iteratively perform the following substeps on the population of strings until the termination criterion has been satisfied: (A) Assign a fitness value to each individual in the population using the fitness measure. (C) Create a new population of strings by applying the following three genetic operations. The genetic operations are applied to individual string(s) in the population chosen with a probability based on fitness. (i) Reproduce an existing individual string by copying it into the new population. (ii) Create two new strings from two existing strings by genetically recombining substrings using the crossover operation (described below) at a randomly chosen crossover point. randomly mutating the character at one randomly chosen position in the string. (iii) Create a new string from an existing string by (3) The string that is identified by the method of result designation (e.g., the best-so-far individual) is designated as the result of the genetic algorithm for the run. This result may represent a solution (or an approximate solution) to the problem. The genetic operation of reproduction is based on the Darwinian principle of reproduction and survival of the fittest. In the reproduction operation, an individual is probabilistically selected from the population based on its fitness (with reselection allowed) and then the individual is copied, without change, into the next generation of the population. The selection is done in such a way that the better an individual's fitness, the more likely it is to be selected. An important aspect of this probabilistic selection is that every individual, however poor its fitness, has some probability of selection. The genetic operation of crossover (sexual recombination) allows new individuals (i.e., new points in the search space) to be created and tested. The operation of crossover starts with two parents independently selected probabilistically from the population based on their fitness (with reselection allowed). As before, the selection is done in such a way that the better an individual's fitness, the more likely it is to be selected. The crossover operation produces two offspring. Each offspring contains some genetic material from each of its parents. Suppose that the crossover operation is to be applied to the two parental strings 10110 and 01 101 of length L = 5 over an alphabet of size K = 2. The crossover operation begins by randomly selecting a number between 1 and G 1 using a uniform probability distribution. Suppose that the third interstitial location is selected. This location becomes the crossover point. Each parent is then split at this crossover point into a crossover fragment and a remainder. The crossover operation then recombines remainder 1 (i.e., 1 0) with crossover fragment 2 (Le., 0 1 1 -) to create offspring 2 (Le., 01110). The crossover operation similarly recombines remainder 2 (i.e., 01) with crossover fragment 1 (i.e., 101 -) to create offspring 1 (i.e., 10101). The operation of mutation allows new individuals to be created. It begins by selecting an individual &om the population based on its fitness (with reselection allowed). A point along the string is selected at random and the character at that point is randomly changed. The altered individual is then copied into the next generation of the population. Mutation is used very sparingly in genetic algorithm work. 590 The genetic algorithm works in a domain-independent way on the fixed-length character strings in the population. The genetic algorithm searches the space of possible character strings in an attempt to find high-fitness strings. The fitness landscape may be very rugged and nonlinear. To guide this search, the genetic algorithm uses only the numerical fitness values associated with the explicitly tested strings in the population. Regardless of the particular problem domain, the genetic algorithm carries out its search by performing the same disarmingly simple operations of copying, recombining, and occasionally randomly mutating the strings. In practice, the genetic algorithm is surprisingly rapid in effectively searching complex, highly nonlinear, multidimensional search spaces. This is all the more surprising because the genetic algorithm does not know anything about the problem domain or the internal workings of the fitness measure being used. 1.1 Sources of Additional Information David Goldberg's Genetic Algorithms in Search, Optimization, and Machine Leaming (1989) is the leading textbook and best single source of additional information about the field of genetic Additional information on genetic algorithms can be found in Davis (1987, 1991), Michalewicz (1992), and Buckles and Petry (1992). The proceedings of th

[1]  Juan Julián Merelo Guervós,et al.  Proceedings of the 7th International Conference on Parallel Problem Solving from Nature , 1996 .

[2]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[3]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[4]  Rodney A. Brooks,et al.  Artificial Life IV: Proceedings of the Fourth International Workshop on the Synthesis and Simlulation of Living Systmes , 1994 .

[5]  John R. Koza,et al.  Gene Duplication to Enable Genetic Programming to Concurrently Evolve Both the Architecture and Work-Performing Steps of a Computer Program , 1995, IJCAI.

[6]  Reinhard Männer,et al.  Parallel Problem Solving from Nature — PPSN III , 1994, Lecture Notes in Computer Science.

[7]  K. Maekawa Proceedings of the First IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, Orlando, Florida, USA, June 27-29, 1994 , 1994, International Conference on Evolutionary Computation.

[8]  Lashon B. Booker,et al.  Proceedings of the fourth international conference on Genetic algorithms , 1991 .

[9]  Michael de la Maza,et al.  Book review: Genetic Algorithms + Data Structures = Evolution Programs by Zbigniew Michalewicz (Springer-Verlag, 1992) , 1993 .

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  In Schoenauer,et al.  Parallel Problem Solving from Nature , 1990, Lecture Notes in Computer Science.

[14]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[15]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers" in computers and thought eds , 1995 .

[16]  John R. Koza,et al.  Parallel Genetic Programming on a Network of Transputers , 1995 .

[17]  Lawrence J. Fogel,et al.  Proceedings of the Third Annual Conference on Evolutionary Programming, 24-26 Feb 94, San Diego, California, USA , 1994 .

[18]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[19]  Stephanie Forrest,et al.  Parallelism and programming in classifier systems , 1990 .

[20]  John R. Koza,et al.  Genetic programming II (videotape): the next generation , 1994 .

[21]  Lawrence Davis,et al.  Genetic Algorithms and Simulated Annealing , 1987 .

[22]  Frederick E. Petry,et al.  Genetic Algorithms , 1992 .

[23]  D. Cliff From animals to animats 3 : proceedings of the Third International Conference on Simulation of Adaptive Behavior , 1994 .

[24]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[25]  John R. Koza,et al.  Genetic Programming: The Movie , 1992 .

[26]  Stephanie Forrest,et al.  Proceedings of the 5th International Conference on Genetic Algorithms , 1993 .

[27]  John J. Grefenstette,et al.  Genetic algorithms and their applications , 1987 .

[28]  Georges R. Harik,et al.  Foundations of Genetic Algorithms , 1997 .

[29]  Yuval Davidor,et al.  Genetic algorithms and robotics , 1991 .

[30]  Dr. David W. Pearson,et al.  Artificial Neural Nets and Genetic Algorithms , 1995, Springer Vienna.

[31]  Ron Shonkwiler,et al.  Parallel Genetic Algorithms , 1993, ICGA.

[32]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[33]  Robert G. Reynolds,et al.  Evolutionary Programming IV: Proceedings of the Fourth Annual Conference on Evolutionary Programming , 1995 .