Machine Learning for Detecting Gene-Gene Interactions

Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics.

[1]  W. Bateson Mendel's Principles of Heredity , 1910, Nature.

[2]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[3]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[4]  John von Neumann,et al.  Theory Of Self Reproducing Automata , 1967 .

[5]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[6]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[7]  Tommaso Toffoli,et al.  Cellular Automata as an Alternative to (Rather than an Approximation of) Differential Equations in M , 1984 .

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[10]  John R. Koza,et al.  Genetic generation of both the weights and architecture for a neural network , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  Melanie Mitchell,et al.  Evolving cellular automata to perform computations: mechanisms and impediments , 1994 .

[13]  David M. Skapura,et al.  Building neural networks , 1995 .

[14]  Rocco Rongo,et al.  A parallel cellular tool for interactive modeling and simulation , 1996 .

[15]  W. Gauderman,et al.  Detection of gene-environment interactions in joint segregation and linkage analysis. , 1997, American journal of human genetics.

[16]  J. Ott,et al.  Neural network analysis of complex traits , 1997, Genetic epidemiology.

[17]  Pat Langley,et al.  The Computer-Aided Discovery of Scientific Knowledge , 1998, Discovery Science.

[18]  Lionel Tarassenko,et al.  Guide to Neural Computing Applications , 1998 .

[19]  Sara A. Solla,et al.  Multi-Locus Nonparametric Linkage Analysis of Complex Trait Loci with Neural Networks , 1998, Human Heredity.

[20]  James A. Anderson,et al.  An Introduction To Neural Networks , 1998 .

[21]  P. Phillips The language of gene interaction. , 1998, Genetics.

[22]  C T Falk,et al.  Design of artificial neural network and its applications to the analysis of alcoholism data , 1999, Genetic epidemiology.

[23]  J Ott,et al.  Analysis of complex traits using neural networks , 1999, Genetic epidemiology.

[24]  John P. Rice,et al.  Mapping genotype to phenotype for linkage analysis , 1999, Genetic epidemiology.

[25]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[26]  M. Wade,et al.  Epistasis and the Evolutionary Process , 2000 .

[27]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[28]  Garth A. Gibson,et al.  Canalization in evolutionary genetics: a stabilizing theory? , 2000, BioEssays : news and reviews in molecular, cellular and developmental biology.

[29]  R. Hegele,et al.  Genetic determinants of type 2 diabetes mellitus , 2001, Clinical genetics.

[30]  Mathieu S. Capcarrère,et al.  Necessary conditions for density classification by cellular automata. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  D Curtis,et al.  Use of an artificial neural network to detect association between a disease and multiple marker genotypes , 2001, Annals of human genetics.

[32]  Pat Langley,et al.  The computational support of scientific discovery , 2000, Int. J. Hum. Comput. Stud..

[33]  J. Ott,et al.  Neural networks and disease association studies. , 2001, American journal of medical genetics.

[34]  Jurg Ott,et al.  20 Applications of neural networks for gene finding , 2001 .

[35]  Daniel E. Weeks,et al.  The Complexity of Linkage Analysis with Neural Networks , 2001, Human Heredity.

[36]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[37]  Jason H. Moore,et al.  Application Of Genetic Algorithms To The Discovery Of Complex Models For Simulation Studies In Human Genetics , 2002, GECCO.

[38]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[39]  Jason H. Moore,et al.  Cellular Automata and Genetic Algorithms for Parallel Problem Solving in Human Genetics , 2002, PPSN.

[40]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[41]  Jason H. Moore,et al.  A Cellular Automata Approach to Detecting Interactions Among Single-nucleotide Polymorphisms in Complex Multifactorial Diseases , 2001, Pacific Symposium on Biocomputing.

[42]  Scott M. Williams,et al.  New strategies for identifying gene-gene interactions in hypertension , 2002, Annals of medicine.

[43]  Jason H. Moore,et al.  An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation , 2004, BMC Bioinformatics.

[44]  J. Stengård,et al.  Genes, Environment, and Cardiovascular Disease , 2003, Arteriosclerosis, thrombosis, and vascular biology.

[45]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[46]  Aleks Jakulin,et al.  Attribute Interactions in Machine Learning , 2003 .

[47]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[48]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[49]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[50]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[51]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[52]  Silvio Bicciato,et al.  Pattern identification and classification in gene expression data using an autoassociative neural network model. , 2003, Biotechnology and bioengineering.

[53]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[54]  Fuu-Jen Tsai,et al.  Prediction of survival in surgical unresectable lung cancer by artificial neural networks including genetic polymorphisms and clinical parameters , 2003, Journal of clinical laboratory analysis.

[55]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[56]  Jason H. Moore,et al.  Ideal discrimination of discrete clinical endpoints using multilocus genotypes , 2004, Silico Biol..

[57]  Holger Schwender,et al.  A pilot study on the application of statistical classification procedures to molecular epidemiological data. , 2004, Toxicology letters.

[58]  Hiroyuki Honda,et al.  Artificial neural network approach for selection of susceptible single nucleotide polymorphisms and construction of prediction model on childhood allergic asthma , 2004, BMC Bioinformatics.

[59]  N. Cook,et al.  Tree and spline based association analysis of gene–gene interaction models for ischemic stroke , 2004, Statistics in medicine.

[60]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[61]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[62]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[63]  Enrico Smeraldi,et al.  Neural network analysis in pharmacogenetics of mood disorders , 2004, BMC Medical Genetics.

[64]  Jason H Moore,et al.  Computational analysis of gene-gene interactions using multifactor dimensionality reduction , 2004, Expert review of molecular diagnostics.

[65]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[66]  M. Reilly,et al.  MDR and PRP: A Comparison of Methods for High-Order Genotype-Phenotype Associations , 2005, Human Heredity.

[67]  Jason H. Moore,et al.  STUDENTJAMA. The challenges of whole-genome approaches to common diseases. , 2004, JAMA.

[68]  R. Lenski,et al.  Pervasive joint influence of epistasis and plasticity on mutational effects in Escherichia coli , 2004, Nature Genetics.

[69]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[70]  Harlan M Krumholz,et al.  Reporting of model validation procedures in human studies of genetic interactions. , 2004, Nutrition.

[71]  Marylyn D. Ritchie,et al.  Multilocus Analysis of Hypertension: A Hierarchical Approach , 2004, Human Heredity.

[72]  G. Church,et al.  Modular epistasis in yeast metabolism , 2005, Nature Genetics.

[73]  Serge Batalov,et al.  Susceptibility and modifier genes in Portuguese transthyretin V30M amyloid polyneuropathy: complexity in a single-gene disease. , 2005, Human molecular genetics.

[74]  Russell A Wilke,et al.  Relative impact of CYP3A genotype and concomitant medication on the severity of atorvastatin-induced muscle damage , 2005, Pharmacogenetics and genomics.

[75]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[76]  Jason H. Moore,et al.  A global view of epistasis , 2005, Nature Genetics.

[77]  Marylyn D. Ritchie,et al.  Can Neural Network Constraints in GP Provide Power to Detect Genes Associated with Human Disease? , 2005, EvoWorkshops.

[78]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[79]  E R Martin,et al.  Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. , 2005, American journal of human genetics.

[80]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[81]  Marylyn D. Ritchie,et al.  GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease , 2006, BMC Bioinformatics.

[82]  Lin He,et al.  An association study of the N-methyl-D-aspartate receptor NR1 subunit gene (GRIN1) and NR2B subunit gene (GRIN2B) in schizophrenia with universal DNA microarray , 2005, European Journal of Human Genetics.

[83]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[84]  Jason H. Moore,et al.  The Interaction of Four Genes in the Inflammation Pathway Significantly Predicts Prostate Cancer Risk , 2005, Cancer Epidemiology Biomarkers & Prevention.

[85]  E R Martin,et al.  An Analysis Paradigm for Investigating Multi‐locus Effects in Complex Disease: Examination of Three GABAA Receptor Subunit Genes on 15q11‐q13 as Risk Factors for Autistic Disorder. , 2006, Annals of human genetics.

[86]  Nancy J. Brown,et al.  Risk Factor Interactions and Genetic Effects Associated with Post-Operative Atrial Fibrillation , 2005, Pacific Symposium on Biocomputing.

[87]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.