Background on Genetic Algorithms

This chapter introduces evolutionary computation/genetic algorithms starting at a high level. It uses the schema sampling theorem to provide an intuitive understanding for how evolution, operating on a population of chromosomes (symbol strings), will produce offspring that contain variants of the symbol patterns in the more fit parents each generation, and shows how the recombination operators will be biased for and against some patterns. The No Free Lunch (NFL) theorem of Wolpert and Macready for optimization search algorithms has shown that over the space of all possible problems, there can be no universally superior algorithm. Hence, it is incumbent on any algorithm to attempt to identify the domain of problems for which it is effective and try to identify its strengths and limitations. In the next section, we introduce Eshelman’s CHC genetic algorithm and recombination operators that have been developed for bit string and integer chromosomes. After showing its strengths particularly in dealing with some of the challenges for traditional genetic algorithms, its limitations are also shown. The final section takes up the application of CHC to subset selection problems, a domain of considerable utility for many machine learning applications. We present a series of empirical tests that lead us to the index chromosome representation and the match and mix set-subset size (MMX_SSS) recombination operator that seem well suited for this domain. Variants are shown for when the size of the desired subset is known and when it is not known. We apply this algorithm in later chapters to the feature subset selection problem that is key to our application of developing a speech-based diagnostic test for dementia.

[1]  Cláudio F. Lima,et al.  A review of adaptive population sizing schemes in genetic algorithms , 2005, GECCO '05.

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[4]  Shengxiang Yang,et al.  Evolutionary computation for dynamic optimization problems , 2013, GECCO.

[5]  Zne-Jung Lee,et al.  An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer , 2008, Artif. Intell. Medicine.

[6]  J. David Schaffer,et al.  New crossover operators for multiple subset selection tasks , 2014, ArXiv.

[7]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[8]  M. Gell-Mann A Theory of Everything. (Book Reviews: The Quark and the Jaguar. Adventures in the Simple and the Complex.) , 1994 .

[9]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[10]  David E. Goldberg,et al.  Sizing Populations for Serial and Parallel Genetic Algorithms , 1989, ICGA.

[11]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[12]  L. Darrell Whitley,et al.  Test Function Generators as Embedded Landscapes , 1998, FOGA.

[13]  Lilla Böröczky,et al.  Feature subset selection for improving the performance of false positive reduction in lung nodule CAD , 2006, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[14]  Zbigniew Michalewicz,et al.  Evolutionary Approach to Non-stationary Optimisation Tasks , 1999, ISMIS.

[15]  Thomas Bäck,et al.  Evolutionary Algorithms in Theory and Practice , 1996 .

[16]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[17]  J. David Schaffer,et al.  Developing an Evolutionary Algorithm to Search for an Optimal Multi-Mother Wavelet Packets Combination , 2015 .

[18]  Vinay Varadan,et al.  DNA methylation patterns in luminal breast cancers differ from non‐luminal subtypes and can identify relapse risk independent of other clinical variables , 2011, Molecular oncology.

[19]  Martin Pelikan NK landscapes, problem difficulty, and hybrid evolutionary algorithms , 2010, GECCO '10.

[20]  E. D. Weinberger,et al.  The NK model of rugged fitness landscapes and its application to maturation of the immune response. , 1989, Journal of theoretical biology.

[21]  Keith E. Mathias,et al.  Niches in NK-Landscapes , 2000, FOGA.

[22]  Mark Simpson,et al.  A Genetic Algorithm Approach for Discovering Diagnostic Patterns in Molecular Measurement Data , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[23]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[24]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[25]  H. Simon,et al.  Models Of Man : Social And Rational , 1957 .

[26]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[27]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[28]  Larry J. Eshelman,et al.  Representation and Hidden Bias II: Eliminating Defining Length Bias in Genetic Search via Shuffle Crossover , 1989, IJCAI.