DESeeker: Detecting Epistatic Interactions Using a Two-Stage Differential Evolution Algorithm

Epistatic interactions are generally defined as the interactions between different single-nucleotide polymorphisms (SNPs). Identifying epistatic interactions is important for determining the individual susceptibility of complex diseases. In large-scale association studies, finding epistatic interactions in the large volume of SNP data is a challenging issue. Since the current search approaches are confronted with the serious problem of computational burden, developing an efficient algorithm for dealing with the intensive computing problem would be significant. In this paper, a novel differential evolution-based algorithm DEseeker is proposed to detect epistatic interactions. DEseeker, combined with a local search and a self-adapting parameter tuning strategy, employs a two-stage design of DE to enhance its search capability. DEseeker is compared with the other recent algorithms on a set of simulated datasets and a real biological dataset. The experimental results on the simulated datasets show that the proposed algorithm is superior to the other compared algorithms in terms of detection power. The discovery of the real biological dataset demonstrates that the proposed algorithm is promising for practical disease prognosis.

[1]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[2]  Oswaldo Trelles,et al.  Review: High-performance computing to detect epistasis in genome scale data sets , 2016, Briefings Bioinform..

[3]  Ge Yu,et al.  Maximal Subspace Coregulated Gene Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Jing Cui,et al.  Common variants at CD40 and other loci confer risk of rheumatoid arthritis , 2008, Nature Genetics.

[5]  Jeffrey Xu Yu,et al.  Learning Phenotype Structure Using Sequence Model , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Wu Deng,et al.  An Improved Ant Colony Optimization Algorithm Based on Hybrid Strategies for Scheduling Problem , 2019, IEEE Access.

[7]  Romdhane Rekaya,et al.  AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm , 2010, BMC Research Notes.

[8]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[9]  Tao Li,et al.  Binary Differential Evolution Based on Individual Entropy for Feature Subset Optimization , 2019, IEEE Access.

[10]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[11]  Shouheng Tuo FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies , 2018, Genes.

[12]  Janez Brest,et al.  Self-Adapting Control Parameters in Differential Evolution: A Comparative Study on Numerical Benchmark Problems , 2006, IEEE Transactions on Evolutionary Computation.

[13]  Guoxian Yu,et al.  HiSeeker: Detecting High-Order SNP Interactions Based on Pairwise SNP Combinations , 2017, Genes.

[14]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[15]  P. Gregersen,et al.  The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis. , 1987, Arthritis and rheumatism.

[16]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[17]  R. Jiang,et al.  Epistatic Module Detection for Case-Control Studies: A Bayesian Model with a Gibbs Sampling Strategy , 2009, PLoS genetics.

[18]  D. Strachan,et al.  Rheumatoid arthritis association at 6q23 , 2007, Nature Genetics.

[19]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[20]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[21]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[22]  Ling Xu,et al.  Study on a Novel Fault Damage Degree Identification Method Using High-Order Differential Mathematical Morphology Gradient Spectrum Entropy , 2018, Entropy.

[23]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[24]  Ryo Yamada,et al.  LAMPLINK: detection of statistically significant SNP combinations from GWAS data , 2016, Bioinform..

[25]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[26]  Luca Maria Gambardella,et al.  Ant Algorithms for Discrete Optimization , 1999, Artificial Life.

[27]  Ya-Wen Yu,et al.  An efficient gene-gene interaction test for genome-wide association studies in trio families , 2016, Bioinform..

[28]  Dan Liu,et al.  Performance analysis of novel methods for detecting epistasis , 2011, BMC Bioinformatics.

[29]  Junying Zhang,et al.  Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations , 2017, Scientific Reports.

[30]  Kai Ming Ting Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[31]  A. Starlard-Davenport,et al.  Primer in Genetics and Genomics, Article 7—Multifactorial Concepts: Gene–Gene Interactions , 2018, Biological research for nursing.

[32]  Maocai Wang,et al.  A Two-Stage Ensemble of Differential Evolution Variants for Numerical Optimization , 2019, IEEE Access.

[33]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[34]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[35]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[36]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[37]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[38]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[39]  Yuhai Zhao,et al.  Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions , 2019, Genes.

[40]  Peter Donnelly,et al.  Progress and challenges in genome-wide association studies in humans , 2008, Nature.

[41]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[42]  S. Gabriel,et al.  Two independent alleles at 6q23 associated with risk of rheumatoid arthritis , 2007, Nature Genetics.