Performance of epistasis detection methods in semi-simulated GWAS

BackgroundPart of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium.ResultsGBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.105 SNPs and 15,000 samples in a couple of hours using a GPU.ConclusionThis study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.

[1]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[2]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[3]  J. Lehár,et al.  Multi-target therapeutics: when the whole is greater than the sum of the parts. , 2007, Drug discovery today.

[4]  Karsten M. Borgwardt,et al.  EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units , 2011, European Journal of Human Genetics.

[5]  Frank Rühle,et al.  Postgwas: Advanced GWAS Interpretation in R , 2013, PloS one.

[6]  Zhaoxia Yu,et al.  Genome‐Wide Analysis of Gene‐Gene and Gene‐Environment Interactions Using Closed‐Form Wald Tests , 2015, Genetic epidemiology.

[7]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[8]  Nick C Fox,et al.  Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease , 2013, Nature Genetics.

[9]  B. Goudey,et al.  Detection of epistasis in genome-wide association studies , 2016 .

[10]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[11]  J. Ioannidis,et al.  Meta-analysis methods for genome-wide association studies and beyond , 2013, Nature Reviews Genetics.

[12]  Andrew G. Clark,et al.  Gene-Based Testing of Interactions in Association Studies of Quantitative Traits , 2013, PLoS genetics.

[13]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[14]  David Pot,et al.  An initial assessment of linkage disequilibrium (LD) in coffee trees: LD patterns in groups of Coffea canephora Pierre using microsatellite analysis , 2013, BMC Genomics.

[15]  S. Wood,et al.  Risk Perception and Risk-Taking Behaviour during Adolescence: The Influence of Personality and Gender , 2016, PloS one.

[16]  Marylyn D. Ritchie,et al.  Data Simulation Software for Whole-Genome Association and Other Studies in Human Genetics , 2005, Pacific Symposium on Biocomputing.

[17]  L. Henry,et al.  Global epidemiology of nonalcoholic fatty liver disease—Meta‐analytic assessment of prevalence, incidence, and outcomes , 2016, Hepatology.

[18]  Kevin P. White,et al.  Divergent Transcriptional Regulatory Logic at the Intersection of Tissue Growth and Developmental Patterning , 2013, PLoS genetics.

[19]  M Emily,et al.  IndOR: a new statistical procedure to test for SNP–SNP epistasis in genome‐wide association studies , 2012, Statistics in medicine.

[20]  Aleksandra Filipovska,et al.  SLIRP Regulates the Rate of Mitochondrial Protein Synthesis and Protects LRPPRC from Degradation , 2015, PLoS genetics.

[21]  Saskia Freytag,et al.  Coverage and efficiency in current SNP chips , 2014, European Journal of Human Genetics.

[22]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[23]  G. Rocheleau,et al.  A survey about methods dedicated to epistasis detection , 2015, Front. Genet..

[24]  Cheng Soon Ong,et al.  GWIS - model-free, fast and exhaustive search for epistatic interactions in case-control GWAS , 2013, BMC Genomics.

[25]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[26]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[27]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[28]  Tanya M. Teslovich,et al.  Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility , 2014, Nature Genetics.

[29]  Mathieu Emily,et al.  AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies , 2016, Statistical applications in genetics and molecular biology.

[30]  J. Hirschhorn,et al.  Biological interpretation of genome-wide association studies using predicted gene functions , 2015, Nature Communications.

[31]  E. Lander,et al.  The mystery of missing heritability: Genetic interactions create phantom heritability , 2012, Proceedings of the National Academy of Sciences.

[32]  Jingyuan Fu,et al.  GWAS as a Driver of Gene Discovery in Cardiometabolic Diseases , 2015, Trends in Endocrinology & Metabolism.

[33]  D. Gianola,et al.  Genomic Heritability: What Is It? , 2014, PLoS genetics.

[34]  P. Visscher,et al.  Nature Genetics Advance Online Publication , 2022 .

[35]  Jason H. Moore,et al.  A global test for gene‐gene interactions based on random matrix theory , 2016, Genetic epidemiology.

[36]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[37]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[38]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[39]  Guimei Liu,et al.  An empirical comparison of several recent epistatic interaction detection methods , 2011, Bioinform..

[40]  Adam Kowalczyk,et al.  GWISFI: A universal GPU interface for exhaustive search of pairwise interactions in case-control GWAS in minutes , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[41]  Ting Hu,et al.  An information-gain approach to detecting three-way epistatic interactions in genetic association studies , 2013, J. Am. Medical Informatics Assoc..

[42]  J. Gelpí,et al.  Unveiling Case‐Control Relationships in Designing a Simple and Powerful Method for Detecting Gene‐Gene Interactions , 2012, Genetic epidemiology.

[43]  G. Nuel,et al.  Alternative Methods for H1 Simulations in Genome-Wide Association Studies , 2012, Human Heredity.

[44]  Chun Li,et al.  GWAsimulator: a rapid whole-genome simulation program , 2007, Bioinform..

[45]  Can Yang,et al.  GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies , 2011, Bioinform..

[46]  Ioannis Xenarios,et al.  FastEpistasis: a high performance computing solution for quantitative trait epistasis , 2010, Bioinform..

[47]  Lin He,et al.  SHEsisEpi, a GPU-enhanced genome-wide SNP-SNP interaction scanning algorithm, efficiently reveals the risk genetic epistasis in bipolar disorder , 2010, Cell Research.

[48]  Christophe Ambroise,et al.  Eigen-Epistasis for detecting gene-gene interactions , 2016, BMC Bioinformatics.

[49]  Harsh Agrawal,et al.  Heart Failure with Preserved Ejection Fraction: Entresto a Possible Option. , 2017, Cardiovascular & hematological disorders drug targets.

[50]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[51]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[52]  Brooke L. Fridley,et al.  GWAS meta-analysis and replication identifies three new susceptibility loci for ovarian cancer , 2013, Nature Genetics.

[53]  Mario Roederer,et al.  Trispecific broadly neutralizing HIV antibodies mediate potent SHIV protection in macaques , 2017, Science.

[54]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[55]  Cheng Soon Ong,et al.  Stability of Bivariate GWAS Biomarker Detection , 2014, PloS one.

[56]  Sui-Lung Su,et al.  Epistasis Test in Meta-Analysis: A Multi-Parameter Markov Chain Monte Carlo Model for Consistency of Evidence , 2016, PloS one.

[57]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.