A survey about methods dedicated to epistasis detection

During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).

[1]  Chia-Ying Chang Theoretical and empirical analysis , 2017 .

[2]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[3]  J. François,et al.  Natural Yeast Promoter Variants Reveal Epistasis in the Generation of Transcriptional-Mediated Noise and Its Potential Benefit in Stressful Conditions , 2015, Genome biology and evolution.

[4]  Hong-Bin Shen,et al.  MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies , 2015, Bioinform..

[5]  H. Hakonarson,et al.  Epistasis amongst PTPN2 and genes of the vitamin D pathway contributes to risk of juvenile idiopathic arthritis , 2015, The Journal of Steroid Biochemistry and Molecular Biology.

[6]  W. Lai,et al.  Investigation of gene effects and epistatic interactions between Akt1 and neuregulin 1 in the regulation of behavioral phenotypes and social functions in genetic mouse models of schizophrenia , 2015, Front. Behav. Neurosci..

[7]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[8]  Peter C. Andrews,et al.  Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions , 2014, Genetic epidemiology.

[9]  Jason H Moore,et al.  Epistasis analysis using multifactor dimensionality reduction. , 2015, Methods in molecular biology.

[10]  Jason H Moore,et al.  Epistasis analysis using artificial intelligence. , 2015, Methods in molecular biology.

[11]  Matthew B. Taylor,et al.  Higher-order genetic interactions and their contribution to complex traits. , 2015, Trends in genetics : TIG.

[12]  M. Ritchie Finding the epistasis needles in the genome-wide haystack. , 2015, Methods in molecular biology.

[13]  M. Yano,et al.  Hybrid breakdown caused by epistasis-based recessive incompatibility in a cross of rice (Oryza sativa L.). , 2015, The Journal of heredity.

[14]  A. Clark,et al.  Biological knowledge-driven analysis of epistasis in human GWAS with application to lipid traits. , 2015, Methods in molecular biology.

[15]  Jon Doyle,et al.  Bayesian neural networks for detecting epistasis in genetic association studies , 2014, BMC Bioinformatics.

[16]  W. Maixner,et al.  Epistasis between polymorphisms in COMT, ESR1, and GCH1 influences COMT enzyme activity and pain , 2014, PAIN®.

[17]  Qing Lu,et al.  GWGGI: software for genome-wide gene-gene interaction analysis , 2014, BMC Genetics.

[18]  Jason H. Moore,et al.  Why epistasis is important for tackling complex human disease genetics , 2014, Genome Medicine.

[19]  M. Fornage,et al.  Title: Polygenic type 2 diabetes prediction at the limit of common variant detection Running title: T2D polygenic prediction , 2014 .

[20]  Gilles Louppe,et al.  Exploiting SNP Correlations within Random Forest for Genome-Wide Association Studies , 2014, PloS one.

[21]  Jason H. Moore,et al.  Bioinformatics challenges in genome-wide association studies (GWAS). , 2014, Methods in molecular biology.

[22]  T. Mackay Epistasis and quantitative traits: using model organisms to study gene–gene interactions , 2013, Nature Reviews Genetics.

[23]  Wen Tan,et al.  Stability SCAD: a powerful approach to detect interactions in large-scale genomic study , 2013, BMC Bioinformatics.

[24]  Marylyn D. Ritchie,et al.  Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development , 2013, BioData Mining.

[25]  Ching Lee Koo,et al.  A Review for Detecting Gene-Gene Interactions Using Machine Learning Methods in Genetic Epidemiology , 2013, BioMed research international.

[26]  Scott M. Williams,et al.  A Simple and Computationally Efficient Approach to Multifactor Dimensionality Reduction Analysis of Gene-Gene Interactions for Quantitative Traits , 2013, PloS one.

[27]  S. Wuchty,et al.  eQTL Epistasis – Challenges and Computational Approaches , 2013, Front. Genet..

[28]  Cheng Soon Ong,et al.  GWIS - model-free, fast and exhaustive search for epistatic interactions in case-control GWAS , 2013, BMC Genomics.

[29]  Bin Chen,et al.  The ChEMBL database as linked open data , 2013, Journal of Cheminformatics.

[30]  D. Schaid,et al.  Trees Assembling Mann‐Whitney Approach for Detecting Genome‐Wide Joint Association Among Low‐Marginal‐Effect Loci , 2013, Genetic epidemiology.

[31]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[32]  Marylyn D. Ritchie,et al.  Next-Generation Analysis of Cataracts: Determining Knowledge Driven Gene-Gene Interactions Using Biofilter, and Gene-Environment Interactions Using the PhenX Toolkit , 2012, Pacific Symposium on Biocomputing.

[33]  Constantin F. Aliferis,et al.  Algorithms for discovery of multiple Markov boundaries , 2013, J. Mach. Learn. Res..

[34]  Hua Xu,et al.  Genetic studies of complex human diseases: Characterizing SNP-disease associations using Bayesian networks , 2012, BMC Systems Biology.

[35]  R. Elston,et al.  A Likelihood Ratio‐Based Mann‐Whitney Approach Finds Novel Replicable Joint Gene Action for Type 2 Diabetes , 2012, Genetic epidemiology.

[36]  Luo Jiawei,et al.  An Improved Markov Blanket Approach to Detect SNPs-Disease Associations in Case-Control Studies , 2012 .

[37]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[38]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[39]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[40]  Asako Koike,et al.  SNPInterForest: A new method for detecting epistatic interactions , 2011, BMC Bioinformatics.

[41]  Xue-wen Chen,et al.  FEPI-MB: identifying SNPs-disease association using a Markov Blanket-based approach , 2011, BMC Bioinformatics.

[42]  Xue-wen Chen,et al.  bNEAT: a Bayesian network method for detecting epistatic interactions in genome-wide association studies , 2011, BMC Genomics.

[43]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[44]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[45]  Marylyn D. Ritchie,et al.  Use of Biological Knowledge to Inform The Analysis of Gene-Gene Interactions Involved in Modulating Virologic Failure with Efavirenz-Containing Treatment Regimens in Art-Naive Actg Clinical Trials Participants , 2011, Pacific Symposium on Biocomputing.

[46]  N. Lytkin,et al.  Causal graph-based analysis of genome-wide association data in rheumatoid arthritis , 2011, Biology Direct.

[47]  R. Elston,et al.  The Meaning of Interaction , 2010, Human Heredity.

[48]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[49]  Xue-wen Chen,et al.  A Markov blanket-based method for detecting causal SNPs in GWAS , 2010, BMC Bioinformatics.

[50]  Jason H. Moore,et al.  The Informative Extremes: Using Both Nearest and Farthest Individuals Can Improve Relief Algorithms in the Domain of Human Genetics , 2010, EvoBIO.

[51]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions , 2010, J. Mach. Learn. Res..

[52]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[53]  B. Cohen,et al.  Epistasis in a quantitative trait captured by a molecular model of transcription factor interactions. , 2010, Theoretical population biology.

[54]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[55]  Jason H. Moore,et al.  Environmental Sensing of Expert Knowledge in a Computational Evolution System for Complex Problem Solving in Human Genetics , 2010 .

[56]  Joshua L. Payne,et al.  Sensible Initialization of a Computational Evolution System Using Expert Knowledge for Epistasis Analysis in Human Genetics , 2010 .

[57]  Romdhane Rekaya,et al.  AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm , 2010, BMC Research Notes.

[58]  J. Hirschhorn Genomewide association studies--illuminating biologic pathways. , 2009, The New England journal of medicine.

[59]  I. Johnstone,et al.  Statistical challenges of high-dimensional data , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[60]  R. Elston,et al.  Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction method , 2009, Genetic epidemiology.

[61]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[62]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[63]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.

[64]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[65]  G. Hommel,et al.  Confidence interval or p-value?: part 4 of a series on evaluation of scientific publications. , 2009, Deutsches Arzteblatt international.

[66]  Rui Jiang,et al.  A random forest approach to the detection of epistatic interactions in case-control studies , 2009, BMC Bioinformatics.

[67]  Marylyn D. Ritchie,et al.  Pacific Symposium on Biocomputing 14:368-379 (2009) BIOFILTER: A KNOWLEDGE-INTEGRATION SYSTEM FOR THE MULTI-LOCUS ANALYSIS OF GENOME-WIDE ASSOCIATION STUDIES * , 2022 .

[68]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[69]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[70]  Jason H. Moore,et al.  Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases , 2008, Human Genetics.

[71]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[72]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[73]  Jason H. Moore,et al.  Evaporative cooling feature selection for genotypic data involving interactions , 2007, Bioinform..

[74]  H. Bussey,et al.  Exploring genetic interactions and networks with yeast , 2007, Nature Reviews Genetics.

[75]  Jason H. Moore,et al.  Tuning ReliefF for Genome-Wide Genetic Analysis , 2007, EvoBIO.

[76]  David J. Leinweber,et al.  Stupid Data Miner Tricks , 2007 .

[77]  Marylyn D. Ritchie,et al.  Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions , 2006, Bioinform..

[78]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[79]  David M. Reif,et al.  Machine Learning for Detecting Gene-Gene Interactions , 2006, Applied bioinformatics.

[80]  J. Manners,et al.  A perspective. , 2006, Annals of cardiac anaesthesia.

[81]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[82]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[83]  K. Lunetta,et al.  Identifying SNPs predictive of phenotype using random forests , 2005, Genetic epidemiology.

[84]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[85]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[86]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[87]  Igor Kononenko,et al.  Bayesian neural networks , 1989, Biological Cybernetics.

[88]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[89]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[90]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[91]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[92]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[93]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[94]  D. Postma,et al.  Gene-gene interaction in asthma: IL4RA and IL13 in a Dutch population with asthma. , 2002, American journal of human genetics.

[95]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[96]  Matsuda,et al.  Physical nature of higher-order mutual information: intrinsic correlations and frustration , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[97]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[98]  Christopher M. Bishop,et al.  Bayesian Neural Networks , 1997, J. Braz. Comput. Soc..

[99]  M Dorigo,et al.  Ant colonies for the travelling salesman problem. , 1997, Bio Systems.

[100]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[101]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.

[102]  C. Waddington Canalization of Development and the Inheritance of Acquired Characters , 1942, Nature.

[103]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[104]  G. Mendel,et al.  Mendel's Principles of Heredity , 1910, Nature.

[105]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .