How to increase our belief in discovered statistical interactions via large-scale association studies?

The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.

[1]  Jason H. Moore,et al.  Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS , 2010, Bioinform..

[2]  Gary D Bader,et al.  GeneMANIA: Fast gene network construction and function prediction for Cytoscape , 2014, F1000Research.

[3]  Kristel Van Steen,et al.  Practical aspects of genome-wide association interaction analysis , 2014, Human Genetics.

[4]  Scott T. Weiss,et al.  Genomic screening in family-based association testing and the multiple testing problem , 2005 .

[5]  S. Glantz Primer of applied regression and analysis of variance / Stanton A. Glantz, Bryan K. Slinker , 1990 .

[6]  Jelle J. Goeman,et al.  Multiple hypothesis testing in genomics , 2014, Statistics in medicine.

[7]  James M. Robins,et al.  Multiply Robust Inference for Statistical Interactions , 2008, Journal of the American Statistical Association.

[8]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[9]  R. Elston,et al.  The Meaning of Interaction , 2010, Human Heredity.

[10]  L. Penrose,et al.  THE CORRELATION BETWEEN RELATIVES ON THE SUPPOSITION OF MENDELIAN INHERITANCE , 2022 .

[11]  Kristel Van Steen,et al.  Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis , 2015, Human Heredity.

[12]  D. Nyholt A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. , 2004, American journal of human genetics.

[13]  Jing Li,et al.  Detecting gene-gene interactions using a permutation-based random forest method , 2016, BioData Mining.

[14]  S. Glantz,et al.  Primer of Applied Regression & Analysis of Variance , 1990 .

[15]  Kristel Van Steen,et al.  Comparison of genetic association strategies in the presence of rare alleles , 2011, BMC proceedings.

[16]  Jason H. Moore,et al.  Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection , 2012, BioData Mining.

[17]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[18]  Yu Liu,et al.  Gene interaction enrichment and network analysis to identify dysregulated pathways and their interactions in complex diseases , 2012, BMC Systems Biology.

[19]  Christoph Lange,et al.  Genomic screening and replication using the same data set in family-based association testing , 2005, Nature Genetics.

[20]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[21]  Jean-Philippe Vert,et al.  Novel methods for epistasis detection in genome-wide association studies , 2018, bioRxiv.

[22]  Jason H. Moore,et al.  Improving the Reproducibility of Genetic Association Results Using Genotype Resampling Methods , 2017, EvoApplications.

[23]  K. Van Steen,et al.  The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. , 2018, Annals of translational medicine.

[24]  Jason H Moore,et al.  Epistasis analysis using ReliefF. , 2015, Methods in molecular biology.

[25]  Robert C Elston,et al.  Evaluation of removable statistical interaction for binary traits , 2013, Statistics in medicine.

[26]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[27]  Masao Ueki,et al.  Improved Statistics for Genome-Wide Interaction Analysis , 2012, PLoS genetics.

[28]  Xin Wang,et al.  SNP interaction detection with Random Forests in high-dimensional genetic data , 2012, BMC Bioinformatics.

[29]  E. S. Pearson,et al.  A note on the background to, and refereeing of, R. A. Fisher’s 1918 paper ‘On the correlation between relatives on the supposition of Mendelian inheritance’ , 1976, Notes and Records of the Royal Society of London.

[30]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[31]  Momiao Xiong,et al.  Epistasis analysis for quantitative traits by functional regression model , 2014, Genome research.

[32]  Kristel Van Steen,et al.  Travelling the world of gene-gene interactions , 2012, Briefings Bioinform..

[33]  Mee Young Park,et al.  Penalized logistic regression for detecting gene interactions. , 2008, Biostatistics.

[34]  Kristel Van Steen,et al.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data , 2011, European Journal of Human Genetics.

[35]  Elena S. Gusareva,et al.  Epistasis Detection using Model Based Multifactor Dimensionality Reduction in Structured Populations , 2019, bioRxiv.

[36]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[37]  Timothy B Sackton,et al.  Genotypic Context and Epistasis in Individuals and Populations , 2016, Cell.

[38]  Víctor M. Guerrero,et al.  Use of the Box-Cox transformation with binary response models , 1982 .

[39]  A. Liekens,et al.  BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation , 2011, Genome Biology.

[40]  Michail Tsagris,et al.  Multicollinearity. , 2021, American journal of orthodontics and dentofacial orthopedics : official publication of the American Association of Orthodontists, its constituent societies, and the American Board of Orthodontics.

[41]  W Bateson,et al.  FACTS LIMITING THE THEORY OF HEREDITY. , 1907, Science.

[42]  S H H M Vermeulen,et al.  Application of multi‐locus analytical methods to identify interacting loci in case‐control studies , 2007, Annals of human genetics.

[43]  John S Witte,et al.  Opinion: A gene-centric approach to genome-wide association studies , 2006, Nature Reviews Genetics.

[44]  Ting Hu,et al.  Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis , 2013, BioData Mining.

[45]  Jason H. Moore,et al.  Pacific Symposium on Biocomputing 15:327-336(2010) ENABLING PERSONAL GENOMICS WITH AN EXPLICIT TEST OF EPISTASIS , 2022 .

[46]  J. Friedman Multivariate adaptive regression splines , 1990 .

[47]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[48]  Kristin K Nicodemus,et al.  Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms , 2005, BMC Genetics.

[49]  Jason H. Moore,et al.  Evidence for epistatic interactions in antiepileptic drug resistance , 2011, Journal of Human Genetics.

[50]  Louis Wehenkel,et al.  An efficient algorithm to perform multiple testing in epistasis screening , 2013, BMC Bioinformatics.

[51]  Ting Hu,et al.  Statistical Epistasis Networks Reduce the Computational Complexity of Searching Three-Locus Genetic Models , 2012, Pacific Symposium on Biocomputing.

[52]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[53]  Tyler J VanderWeele,et al.  Recommendations for presenting analyses of effect modification and interaction. , 2012, International journal of epidemiology.

[54]  S. Vansteelandt,et al.  On model selection and model misspecification in causal inference , 2012, Statistical methods in medical research.

[55]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[56]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[57]  Jason H. Moore,et al.  gammaMAXT: a fast multiple-testing correction algorithm , 2015, BioData Mining.

[58]  Ting Hu,et al.  Epistasis analysis using information theory. , 2015, Methods in molecular biology.

[59]  P. Phillips The language of gene interaction. , 1998, Genetics.

[60]  Perry G. Ridge,et al.  Bridging the Gap between Statistical and Biological Epistasis in Alzheimer's Disease , 2015, BioMed research international.

[61]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[62]  Robustness , 2020, The Science of Quantitative Information Flow.

[63]  Peter C. Andrews,et al.  Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions , 2014, Genetic epidemiology.

[64]  Qunyuan Zhang Associating rare genetic variants with human diseases , 2015, Front. Genet..

[65]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[66]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[67]  Anders Eriksson,et al.  Highlighting nonlinear patterns in population genetics datasets , 2015, Scientific Reports.

[68]  Ben Lehner,et al.  Molecular mechanisms of epistasis within and between genes. , 2011, Trends in genetics : TIG.

[69]  B. Schölkopf,et al.  GLIDE: GPU-Based Linear Regression for Detection of Epistasis , 2012, Human Heredity.

[70]  Pak Chung Sham,et al.  A fast and powerful W-test for pairwise epistasis testing , 2016, Nucleic acids research.

[71]  Andreas Ziegler,et al.  A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required , 2014, Front. Genet..

[72]  Pleuni S Pennings,et al.  The population genetics of drug resistance evolution in natural populations of viral, bacterial and eukaryotic pathogens , 2015, Molecular ecology.

[73]  Ting Hu,et al.  ViSEN: Methodology and Software for Visualization of Statistical Epistasis Networks , 2013, Genetic epidemiology.

[74]  N. Malats,et al.  Perspectives on Data Integration in Human Complex Disease Analysis , 2015 .

[75]  D Hurnik,et al.  An overview of techniques for dealing with large numbers of independent variables in epidemiologic studies. , 1997, Preventive veterinary medicine.

[76]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[77]  M. Ritchie Using Biological Knowledge to Uncover the Mystery in the Search for Epistasis in Genome‐Wide Association Studies , 2011, Annals of human genetics.

[78]  T. Heskes,et al.  The statistical properties of gene-set analysis , 2016, Nature Reviews Genetics.

[79]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[80]  Paul Weston,et al.  Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility , 2011, Nature Genetics.

[81]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[82]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[83]  Tyler J VanderWeele,et al.  Tests for Compositional Epistasis under Single Interaction‐Parameter Models , 2011, Annals of human genetics.

[84]  Inchi Hu,et al.  A fast and powerful W-test for pairwise epistasis testing , 2016, Nucleic acids research.

[85]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[86]  J. Piriyapongsa,et al.  iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies , 2012, BMC Genomics.

[87]  E. Martin,et al.  Properties of global‐ and local‐ancestry adjustments in genetic association tests in admixed populations , 2018, Genetic epidemiology.

[88]  Lars Wienbrandt,et al.  Genome-Wide Association Interaction Studies with MB-MDR and maxT multiple testing correction on FPGAs , 2016, ICCS.

[89]  B Pütz,et al.  Cost-effective GPU-Grid for Genome-wide Epistasis Calculations , 2012, Methods of Information in Medicine.

[90]  Kristel Van Steen,et al.  Genome-wide association interaction analysis for Alzheimer's disease , 2014, Neurobiology of Aging.

[91]  V. Moskvina,et al.  On multiple‐testing correction in genome‐wide association studies , 2008, Genetic epidemiology.

[92]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[93]  Casey S. Greene,et al.  Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture , 2009, PloS one.

[94]  Caleb A Lareau,et al.  Network theory for data-driven epistasis networks. , 2015, Methods in molecular biology.

[95]  Casey S. Greene,et al.  IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2012, Nucleic Acids Res..

[96]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[97]  Bertil Schmidt,et al.  Parallelizing Epistasis Detection in GWAS on FPGA and GPU-Accelerated Computing Systems , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[98]  Divyakant Agrawal,et al.  eCEO: an efficient Cloud Epistasis cOmputing model in genome-wide association study , 2011, Bioinform..

[99]  M. L. Calle,et al.  Model‐Based Multifactor Dimensionality Reduction for detecting epistasis in case–control data in the presence of noise , 2011, Annals of human genetics.

[100]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[101]  David Curtis,et al.  Application of Logistic Regression to Case-Control Association Studies Involving Two Causative Loci , 2005, Human Heredity.

[102]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[103]  Kristel Van Steen,et al.  Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data. , 2011 .

[104]  T. VanderWeele On the Distinction Between Interaction and Effect Modification , 2009, Epidemiology.

[105]  S Greenland,et al.  Concepts of interaction. , 1980, American journal of epidemiology.

[106]  I. König,et al.  Identification of interactions using model-based multifactor dimensionality reduction , 2016, BMC Proceedings.

[107]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.

[108]  Marylyn D. Ritchie,et al.  Analysis pipeline for the epistasis search – statistical versus biological filtering , 2014, Front. Genet..

[109]  M. Duraisingh,et al.  Multiple drug resistance genes in malaria – from epistasis to epidemiology , 2005, Molecular microbiology.

[110]  Elena S. Gusareva,et al.  Male-specific epistasis between WWC1 and TLN2 genes is associated with Alzheimer's disease , 2018, Neurobiology of Aging.

[111]  J. Lagergren,et al.  Discovering Genetic Interactions in Large-Scale Association Studies by Stage-wise Likelihood Ratio Tests , 2015, PLoS genetics.

[112]  Kristel Van Steen,et al.  A cautionary note on the impact of protocol changes for genome-wide association SNP × SNP interaction studies: an example on ankylosing spondylitis , 2015, Human Genetics.

[113]  Olga G. Troyanskaya,et al.  IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks , 2015, Nucleic Acids Res..