Mask functions for the symbolic modeling of epistasis using genetic programming

The study of common, complex multifactorial diseases in genetic epidemiology is complicated by nonlinearity in the genotype-to-phenotype mapping relationship that is due, in part, to epistasis or gene-gene interactions. Symobolic discriminant analysis (SDA) is a flexible modeling approach which uses genetic programming (GP) to evolve an optimal predictive model using a predefined collection of mathematical functions, constants, and attributes. This has been shown to be an effective strategy for modeling epistasis. In the present study, we introduce the genetic .mask. as a novel building block which exploits expert knowledge in the form of a pre-constructed relationship between two attributes. The goal of this study was to determine whether the availability of.mask.building blocks improves SDA performance. The results of this study support the idea that pre-processing data improves GP performance.

[1]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[2]  Jason H. Moore,et al.  Symbolic Discriminant Analysis for Mining Gene Expression Patterns , 2001, ECML.

[3]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[4]  Branko Soucek,et al.  Dynamic, genetic, and chaotic programming - the sixth generation , 1992, Sixth-generation computer technology series.

[5]  Jason H. Moore,et al.  An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[6]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[7]  Jason H. Moore,et al.  Evolutionary Computation in Microarray Data Analysis , 2002 .

[8]  Jason H. Moore,et al.  Exploiting Expert Knowledge in Genetic Programming for Genome-Wide Genetic Analysis , 2006, PPSN.

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  John R. Koza,et al.  Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems , 1990 .

[11]  Terence Soule,et al.  Genetic Programming Theory and Practice IV , 2007 .

[12]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[13]  Jason H. Moore,et al.  Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction: Feature Selection and Construction in the Domain of Human Genetics , 2009 .

[14]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[15]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[16]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[17]  Jason H Moore,et al.  Computational analysis of gene-gene interactions using multifactor dimensionality reduction , 2004, Expert review of molecular diagnostics.

[18]  Jason H. Moore,et al.  Cross Validation Consistency for the Assessment of Genetic Programming Results in Microarray Studies , 2003, EvoWorkshops.

[19]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[20]  J. Neter,et al.  Applied linear statistical models : regression, analysis of variance, and experimental designs , 1974 .

[21]  J J Rowland,et al.  Model selection methodology in supervised learning with evolutionary computation. , 2003, Bio Systems.

[22]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[23]  David M. Reif,et al.  Combinatorial Pharmacogenetics , 2005, Nature Reviews Drug Discovery.

[24]  Nikolay I. Nikolaev,et al.  Genetic Programming and Data Structures: Genetic Programming+Data Structures=Automatic Programming , 2001, Softw. Focus.

[25]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[26]  P. Nordin Genetic Programming III - Darwinian Invention and Problem Solving , 1999 .

[27]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[28]  J. Rice,et al.  Two‐Locus models of disease , 1992, Genetic epidemiology.

[29]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[30]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[31]  Jason H. Moore,et al.  Complex Function Sets Improve Symbolic Discriminant Analysis of Microarray Data , 2003, GECCO.

[32]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[33]  Jason H. Moore,et al.  Solving complex problems in human genetics using GP: challenges and opportunities , 2008, SEVO.

[34]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[35]  Terence Soule,et al.  Genetic Programming Theory and Practice V , 2008 .