Development and Evaluation of an Open-Ended Computational Evolution System for the Genetic Analysis of Susceptibility to Common Human Diseases

An important goal of human genetics is to identify DNA sequence variations that are predictive of susceptibility to common human diseases. This is a classification problem with data consisting of discrete attributes and a binary outcome. A variety of different machine learning methods based on artificial evolution have been developed and applied to modeling the relationship between genotype and phenotype. While artificial evolution approaches show promise, they are far from perfect and are only loosely based on real biological and evolutionary processes. It has recently been suggested that a new paradigm is needed where "artificial evolution" is transformed to "computational evolution" (CE) by incorporating more biological and evolutionary complexity into existing algorithms. It has been proposed that CE systems will be more likely to solve problems of interest to biologists and biomedical researchers. The goal of the present study was to develop and evaluate a prototype CE system for the analysis of human genetics data. We describe here this new open-ended CE system and provide initial results from a simulation study that suggests more complex operators result in better solutions.

[1]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[2]  Terence Soule,et al.  Genetic Programming Theory and Practice IV , 2007 .

[3]  Jason H. Moore,et al.  An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[4]  W. Bateson Mendel's Principles of Heredity , 1910, Nature.

[5]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[6]  John R. Koza,et al.  Genetic Programming III: Darwinian Invention & Problem Solving , 1999 .

[7]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[8]  Rick L. Riolo,et al.  Genetic Programming Theory and Practice XIX , 2008, Genetic and Evolutionary Computation.

[9]  J. Ott,et al.  Neural network analysis of complex traits , 1997, Genetic epidemiology.

[10]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[11]  John R. Koza,et al.  Genetic Programming II , 1992 .

[12]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[13]  Jason H. Moore,et al.  Symbolic Discriminant Analysis for Mining Gene Expression Patterns , 2001, ECML.

[14]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[15]  Luc De Raedt,et al.  Machine Learning: ECML 2001 , 2001, Lecture Notes in Computer Science.

[16]  Giandomenico Spezzano,et al.  A Cellular Genetic Programming Approach to Classification , 1999, GECCO.

[17]  J. Miller,et al.  Guidelines: From artificial evolution to computational evolution: a research agenda , 2006, Nature Reviews Genetics.

[18]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[19]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[20]  W. B. Langdon,et al.  Genetic Programming and Data Structures , 1998, The Springer International Series in Engineering and Computer Science.

[21]  Mark A McPeek,et al.  The phylogenetic distribution of metazoan microRNAs: insights into evolutionary complexity and constraint. , 2006, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[22]  Daniel E. Goldberg The design of innovation: Lessons from genetic algorithms , 1998 .

[23]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[24]  Edmund K. Burke,et al.  Parallel Problem Solving from Nature - PPSN IX: 9th International Conference, Reykjavik, Iceland, September 9-13, 2006, Proceedings , 2006, PPSN.

[25]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[26]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[27]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[28]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[29]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[30]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[31]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[32]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[33]  William B. Langdon,et al.  Combining Decision Trees and Neural Networks for Drug Discovery , 2002, EuroGP.