Performance analysis of novel methods for detecting epistasis

BackgroundEpistasis is recognized fundamentally important for understanding the mechanism of disease-causing genetic variation. Though many novel methods for detecting epistasis have been proposed, few studies focus on their comparison. Undertaking a comprehensive comparison study is an urgent task and a pathway of the methods to real applications.ResultsThis paper aims at a comparison study of epistasis detection methods through applying related software packages on datasets. For this purpose, we categorize methods according to their search strategies, and select five representative methods (TEAM, BOOST, SNPRuler, AntEpiSeeker and epiMODE) originating from different underlying techniques for comparison. The methods are tested on simulated datasets with different size, various epistasis models, and with/without noise. The types of noise include missing data, genotyping error and phenocopy. Performance is evaluated by detection power (three forms are introduced), robustness, sensitivity and computational complexity.ConclusionsNone of selected methods is perfect in all scenarios and each has its own merits and limitations. In terms of detection power, AntEpiSeeker performs best on detecting epistasis displaying marginal effects (eME) and BOOST performs best on identifying epistasis displaying no marginal effects (eNME). In terms of robustness, AntEpiSeeker is robust to all types of noise on eME models, BOOST is robust to genotyping error and phenocopy on eNME models, and SNPRuler is robust to phenocopy on eME models and missing data on eNME models. In terms of sensitivity, AntEpiSeeker is the winner on eME models and both SNPRuler and BOOST perform well on eNME models. In terms of computational complexity, BOOST is the fastest among the methods. In terms of overall performance, AntEpiSeeker and BOOST are recommended as the efficient and effective methods. This comparison study may provide guidelines for applying the methods and further clues for epistasis detection.

[1]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[2]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[3]  Yongmei Liu,et al.  A ground truth based comparative study on detecting epistatic SNPs , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[4]  Momiao Xiong,et al.  The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. , 2003, Molecular biology and evolution.

[5]  Richard E. Neapolitan,et al.  Probabilistic Methods for Bioinformatics: with an Introduction to Bayesian Networks , 2009 .

[6]  Michele Pinelli,et al.  A novel approach to simulate gene-environment interactions in complex diseases , 2009, BMC Bioinformatics.

[7]  Detecting epistatic interactions contributing to human gene expression using the CEPH family data , 2007, BMC proceedings.

[8]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[9]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[10]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[11]  M. Bartlett Contingency Table Interactions , 1935 .

[12]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[13]  Tian Zheng,et al.  Backward Genotype-Trait Association (BGTA)-Based Dissection of Complex Traits in Case-Control Designs , 2006, Human Heredity.

[14]  Xiang Zhang,et al.  Tools for efficient epistasis detection in genome-wide association study , 2010, Source Code for Biology and Medicine.

[15]  Gregory F Cooper,et al.  A fast algorithm for learning epistatic genomic relationships. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Jeffrey R. Kilpatrick,et al.  Methods for detecting multi-locus genotype-phenotype association , 2010 .

[17]  Jason H. Moore,et al.  Pacific Symposium on Biocomputing 15:327-336(2010) ENABLING PERSONAL GENOMICS WITH AN EXPLICIT TEST OF EPISTASIS , 2022 .

[18]  Alison A Motsinger-Reif,et al.  Power of grammatical evolution neural networks to detect gene-gene interactions in the presence of error , 2008, BMC Research Notes.

[19]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[20]  Guimei Liu,et al.  Response: an empirical comparison of several recent epistatic interaction detection methods , 2011, Bioinform..

[21]  Wei-Hao Wang,et al.  Studies , 1926 .

[22]  Xiang Zhang,et al.  Fastanova: an efficient algorithm for genome-wide association study , 2008, KDD.

[23]  C. Sing,et al.  A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. , 2001, Genome research.

[24]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[25]  M. Barmada,et al.  Identifying genetic interactions in genome‐wide data using Bayesian networks , 2010, Genetic epidemiology.

[26]  Shyam Visweswaran,et al.  Learning genetic epistasis using Bayesian network scoring criteria , 2011, BMC Bioinformatics.

[27]  B. Maher Personal genomes: The case of the missing heritability , 2008, Nature.

[28]  N. Schork,et al.  Who's afraid of epistasis? , 1996, Nature Genetics.

[29]  J. Darroch Multiplicative and additive interaction in contingency tables , 1974 .

[30]  M Dorigo,et al.  Ant colonies for the travelling salesman problem. , 1997, Bio Systems.

[31]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[32]  T. Hansen,et al.  A Bayesian Multilocus Association Method: Allowing for Higher-Order Interaction in Association Studies , 2007, Genetics.

[33]  Casey S. Greene,et al.  Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture , 2009, PloS one.

[34]  Romdhane Rekaya,et al.  AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm , 2010, BMC Research Notes.

[35]  Katja Ickstadt,et al.  Comparing Logic Regression Based Methods for Identifying SNP Interactions , 2007, BIRD.

[36]  Jayaram Raghuram,et al.  Comparative analysis of methods for detecting interacting loci , 2011, BMC Genomics.

[37]  Alison A. Motsinger-Reif,et al.  A comparison of internal validation techniques for multifactor dimensionality reduction , 2010, BMC Bioinformatics.

[38]  BMC Bioinformatics , 2005 .

[39]  W. Oetting,et al.  Power of multifactor dimensionality reduction and penalized logistic regression for detecting gene-gene Interaction in a case-control study , 2009, BMC Medical Genetics.

[40]  William Shannon,et al.  Detecting epistatic interactions contributing to quantitative traits , 2004, Genetic epidemiology.

[41]  Jason H. Moore,et al.  A global view of epistasis , 2005, Nature Genetics.

[42]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[43]  R. Jiang,et al.  Epistatic Module Detection for Case-Control Studies: A Bayesian Model with a Gibbs Sampling Strategy , 2009, PLoS genetics.

[44]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[45]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[46]  Qiang Yang,et al.  MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study , 2009, BMC Bioinformatics.

[47]  David M. Reif,et al.  Novel methods for detecting epistasis in pharmacogenomics studies. , 2007, Pharmacogenomics.

[48]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[49]  Xiang Zhang,et al.  COE: A General Approach for Efficient Genome-Wide Two-Locus Epistasis Test in Disease Association Study , 2009, RECOMB.

[50]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[51]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[52]  Caroline Uhler,et al.  Detecting epistasis via Markov bases , 2010, 1006.4929.

[53]  Kerrie L. Mengersen,et al.  Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[54]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[55]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.