ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci

BackgroundGrowing interest and burgeoning technology for discovering genetic mechanisms that influence disease processes have ushered in a flood of genetic association studies over the last decade, yet little heritability in highly studied complex traits has been explained by genetic variation. Non-additive gene-gene interactions, which are not often explored, are thought to be one source of this "missing" heritability.MethodsStochastic methods employing evolutionary algorithms have demonstrated promise in being able to detect and model gene-gene and gene-environment interactions that influence human traits. Here we demonstrate modifications to a neural network algorithm in ATHENA (the Analysis Tool for Heritable and Environmental Network Associations) resulting in clear performance improvements for discovering gene-gene interactions that influence human traits. We employed an alternative tree-based crossover, backpropagation for locally fitting neural network weights, and incorporation of domain knowledge obtainable from publicly accessible biological databases for initializing the search for gene-gene interactions. We tested these modifications in silico using simulated datasets.ResultsWe show that the alternative tree-based crossover modification resulted in a modest increase in the sensitivity of the ATHENA algorithm for discovering gene-gene interactions. The performance increase was highly statistically significant when backpropagation was used to locally fit NN weights. We also demonstrate that using domain knowledge to initialize the search for gene-gene interactions results in a large performance increase, especially when the search space is larger than the search coverage.ConclusionsWe show that a hybrid optimization procedure, alternative crossover strategies, and incorporation of domain knowledge from publicly available biological databases can result in marked increases in sensitivity and performance of the ATHENA algorithm for detecting and modelling gene-gene interactions that influence a complex human trait.

[1]  Jason H. Moore,et al.  Genome-Wide Genetic Analysis Using Genetic Programming: The Critical Need for Expert Knowledge , 2007 .

[2]  Ida G. Sprinkhuizen-Kuyper,et al.  The error surface of the 2-2-1 XOR network: The finite stationary points , 1998, Neural Networks.

[3]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[6]  Bill C. White,et al.  Solving Complex Problems in Human Genetics Using Genetic Programming: The Importance of Theorist-Practitionercomputer Interaction , 2008 .

[7]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[8]  Jason H. Moore,et al.  Application Of Genetic Algorithms To The Discovery Of Complex Models For Simulation Studies In Human Genetics , 2002, GECCO.

[9]  Jason H. Moore,et al.  Genetic Programming Neural Networks as a Bioinformatics Tool for Human Genetics , 2004, GECCO.

[10]  M. McPeek,et al.  Broad and narrow heritabilities of quantitative traits in a founder population. , 2001, American journal of human genetics.

[11]  Freda Kemp Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences , 2003 .

[12]  William Shannon,et al.  Detecting epistatic interactions contributing to quantitative traits , 2004, Genetic epidemiology.

[13]  C. Babinet,et al.  Mice lacking vimentin develop and reproduce without an obvious phenotype , 1994, Cell.

[14]  Jason H. Moore,et al.  A statistical comparison of grammatical evolution strategies in the domain of human genetics , 2005, 2005 IEEE Congress on Evolutionary Computation.

[15]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[16]  John R. Koza,et al.  Genetic generation of both the weights and architecture for a neural network , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[17]  Marylyn D. Ritchie,et al.  Pacific Symposium on Biocomputing 14:368-379 (2009) BIOFILTER: A KNOWLEDGE-INTEGRATION SYSTEM FOR THE MULTI-LOCUS ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES * , 2022 .

[18]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[19]  T. Baba,et al.  Sperm from mice carrying a targeted mutation of the acrosin gene can penetrate the oocyte zona pellucida and effect fertilization. , 1994, The Journal of biological chemistry.

[20]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[21]  Marylyn D. Ritchie,et al.  Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA , 2008, EvoBIO.

[22]  P Chambon,et al.  The cellular retinoic acid binding protein I is dispensable. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[23]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  W James Gauderman,et al.  Sample size requirements for matched case‐control studies of gene–environment interaction , 2002, Statistics in medicine.

[25]  Michael O'Neill,et al.  Grammatical evolution - evolutionary automatic programming in an arbitrary language , 2003, Genetic programming.

[26]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[27]  E. Boerwinkle,et al.  Evidence for Non‐additive Influence of Single Nucleotide Polymorphisms within the Apolipoprotein E Gene , 2004, Annals of human genetics.

[28]  Gonçalo R. Abecasis,et al.  Functional Gene Group Analysis Reveals a Role of Synaptic Heterotrimeric G Proteins in Cognitive Ability , 2010, American journal of human genetics.

[29]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[30]  J. Dayhoff,et al.  Artificial neural networks , 2001, Cancer.

[31]  D. Littman,et al.  Development and function of T cells in mice with a disrupted CD2 gene. , 1992, The EMBO journal.

[32]  Marylyn D. Ritchie,et al.  Initialization parameter sweep in ATHENA: optimizing neural networks for detecting gene-gene interactions in the presence of small main effects , 2010, GECCO '10.

[33]  Anton Yuryev,et al.  Auto-validation of fluorescent primer extension genotyping assay using signal clustering and neural networks , 2004, BMC Bioinformatics.

[34]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[35]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[36]  X. Yao Evolving Artificial Neural Networks , 1999 .

[37]  Kirsi H. Pietiläinen,et al.  HDL Subspecies in Young Adult Twins: Heritability and Impact of Overweight , 2009, Obesity.

[38]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[39]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[40]  T. Frayling Genome–wide association studies provide new insights into type 2 diabetes aetiology , 2007, Nature Reviews Genetics.

[41]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[42]  Jason H. Moore,et al.  Optimal Use of Expert Knowledge in Ant Colony Optimization for the Analysis of Epistasis in Human Disease , 2009, EvoBIO.

[43]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data , 1992 .

[44]  P. Matthews,et al.  Pathway and network-based analysis of genome-wide association studies in multiple sclerosis , 2009, Human molecular genetics.

[45]  Jason H. Moore,et al.  Development and Evaluation of an Open-Ended Computational Evolution System for the Genetic Analysis of Susceptibility to Common Human Diseases , 2008, EvoBIO.

[46]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[47]  S K Durham,et al.  Expression of FosB during mouse development: normal development of FosB knockout mice. , 1996, Oncogene.

[48]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[49]  Jason H. Moore,et al.  Symbolic discriminant analysis of microarray data in autoimmune disease , 2002, Genetic epidemiology.

[50]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[51]  Vra Krkov Kolmogorov's Theorem Is Relevant , 1991, Neural Computation.

[52]  E. Boerwinkle,et al.  Bias of the contribution of single-locus effects to the variance of a quantitative trait. , 1986, American journal of human genetics.

[53]  E. Crawford,et al.  Combining artificial neural networks and transrectal ultrasound in the diagnosis of prostate cancer. , 2003, Oncology.

[54]  Yutaka Shimada,et al.  Prediction of survival in patients with esophageal carcinoma using artificial neural networks , 2005, Cancer.

[55]  S. Tonegawa,et al.  T cell receptor delta gene mutant mice: independent generation of alpha beta T cells and programmed rearrangements of gamma delta TCR genes. , 1993, Cell.

[56]  J. Hirschhorn Genomewide association studies--illuminating biologic pathways. , 2009, The New England journal of medicine.

[57]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[58]  Susumu Tonegawa,et al.  T cell receptor δ gene mutant mice: Independent generation of αβ T cells and programmed rearrangements of γδ TCR genes , 1993, Cell.

[59]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[60]  Casey S. Greene,et al.  Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture , 2009, PloS one.

[61]  K. Gunderson,et al.  High-throughput SNP genotyping on universal bead arrays. , 2005, Mutation research.

[62]  P. O’Reilly,et al.  Genome-wide association study identifies eight loci associated with blood pressure , 2009, Nature Genetics.

[63]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[64]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[65]  W. Gauderman Sample size requirements for association studies of gene-gene interaction. , 2002, American journal of epidemiology.

[66]  Sanjoy Dasgupta,et al.  Adaptive Control Processes , 2010, Encyclopedia of Machine Learning and Data Mining.

[67]  J. Ott,et al.  Neural networks and disease association studies. , 2001, American journal of medical genetics.

[68]  M. LeBlanc,et al.  Increasing the power of identifying gene × gene interactions in genome‐wide association studies , 2008, Genetic epidemiology.

[69]  Annie E. Hill,et al.  Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis , 2008, Proceedings of the National Academy of Sciences.

[70]  D. Goldstein Common genetic variation and human traits. , 2009, The New England journal of medicine.

[71]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[72]  Scott E. Maxwell,et al.  Designing Experiments and Analyzing Data , 1991 .

[73]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[74]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[75]  Momiao Xiong,et al.  Gene and pathway-based second-wave analysis of genome-wide association studies , 2010, European Journal of Human Genetics.

[76]  Jason H. Moore,et al.  An Expert Knowledge-Guided Mutation Operator for Genome-Wide Genetic Analysis Using Genetic Programming , 2007, PRIB.

[77]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[78]  A. Misra,et al.  SNP genotyping: technologies and biomedical applications. , 2007, Annual review of biomedical engineering.

[79]  Marylyn D. Ritchie,et al.  Conquering the Needle-in-a-Haystack: How Correlated Input Variables Beneficially Alter the Fitness Landscape for Neural Networks , 2009, EvoBIO.

[80]  Marylyn D. Ritchie,et al.  Grammatical Evolution of Neural Networks for Discovering Epistasis among Quantitative Trait Loci , 2010, EvoBIO.

[81]  R. Collins,et al.  Common variants at 30 loci contribute to polygenic dyslipidemia , 2009, Nature Genetics.

[82]  Roland Linder,et al.  Microarray data classified by artificial neural networks. , 2007, Methods in molecular biology.

[83]  R. Collins,et al.  Newly identified loci that influence lipid concentrations and risk of coronary artery disease , 2008, Nature Genetics.

[84]  Lance W. Hahn,et al.  Alternative cross-over strategies and selection techniques for grammatical evolution optimized neural networks , 2006, GECCO '06.

[85]  Sara A. Solla,et al.  Multi-Locus Nonparametric Linkage Analysis of Complex Trait Loci with Neural Networks , 1998, Human Heredity.

[86]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..