HS-MMGKG: A Fast Multi-objective Harmony Search Algorithm for Two-locus Model Detection in GWAS

Genome-Wide Association Study (GWAS) plays a very important role in identifying the causes of a disease. Because most of the existing methods for genetic-interaction detection in GWAS are designed for a single-correlation model, their performances vary considerably for different disease models. These methods usually have high computation cost and low accuracy. We present a new multi-objective heuristic optimization methodology named HSMMGKG for detecting genetic interactions. In HS-MMGKG, we use harmony search with five objective functions to improve the efficiency and accuracy. A new strategy based on p-value and MDR is adopted to generate more reasonable results. The Boolean representation in BOOST is modified to calculate the five functions rapidly. These strategies take less time complexity and have higher accuracy while detecting the potential models. We compared HS-MMGKG with CSE, MACOED and FHSA-SED using 26 simulated datasets. The experimental results demonstrate that our method outperforms others in accuracy and computation time. Our method has identified many two-locus SNP combinations that are associated with seven diseases in WTCCC dataset. Some of the SNPs have direct evidence in CTD database. The results may be helpful to further explain the pathogenesis. It is anticipated that our proposed algorithm could be used in GWAS which is helpful in understanding disease mechanism, diagnosis and prognosis.

[1]  Chris S. Haley,et al.  Detecting epistasis in human complex traits , 2014, Nature Reviews Genetics.

[2]  A Masoudi-Nejad,et al.  Cuckoo search epistasis: a new method for exploring significant genetic interactions , 2014, Heredity.

[3]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[4]  Jing Li,et al.  Detecting epistatic effects in association studies at a genomic level based on an ensemble approach , 2011, Bioinform..

[5]  N. Schork,et al.  Single nucleotide polymorphisms and the future of genetic epidemiology , 2000, Clinical genetics.

[6]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[7]  Shyam Visweswaran,et al.  A Bayesian Method for Identifying Genetic Interactions , 2009, AMIA.

[8]  Zong Woo Geem,et al.  A survey on applications of the harmony search algorithm , 2013, Eng. Appl. Artif. Intell..

[9]  E. Speliotes,et al.  Genome-wide association analyses identify 39 new susceptibility loci for diverticular disease , 2018, Nature Genetics.

[10]  Chun-Hou Zheng,et al.  epiACO - a method for identifying epistasis based on ant Colony optimization algorithm , 2017, BioData Mining.

[11]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[12]  R. Houlston,et al.  Genome-wide association studies of cancer: current insights and future perspectives , 2017, Nature Reviews Cancer.

[13]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[14]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[15]  L. Palmer,et al.  Genomewide scans of complex human diseases: true linkage is hard to find. , 2001, American journal of human genetics.

[16]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[17]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[18]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[19]  Jason H. Moore,et al.  Why epistasis is important for tackling complex human disease genetics , 2014, Genome Medicine.

[20]  Attila Gyenesei,et al.  High-throughput analysis of epistasis in genome-wide association studies with BiForce , 2012, Bioinform..

[21]  J. Marchini,et al.  Genome-wide association studies of brain imaging phenotypes in UK Biobank , 2018, Nature.

[22]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[23]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[24]  N. Eriksson,et al.  Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections , 2016, Nature Communications.

[25]  Shuai Meng,et al.  Functional clusters analysis and research based on differential coexpression networks , 2018 .

[26]  Hong-Bin Shen,et al.  MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies , 2015, Bioinform..

[27]  P. Visscher,et al.  Meta-analysis of genome-wide association studies for height and body mass index in ∼700,000 individuals of European ancestry , 2018, bioRxiv.

[28]  R. Weinshilboum,et al.  Genome-wide association studies of drug response and toxicity: an opportunity for genome medicine , 2016, Nature Reviews Drug Discovery.

[29]  N E Morton,et al.  Genetic epidemiology of single-nucleotide polymorphisms. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  De-Shuang Huang,et al.  FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis , 2017, Complex..

[31]  Li-Yeh Chuang,et al.  CMDR based differential evolution identifies the epistatic interaction in genome‐wide association studies , 2017, Bioinform..

[32]  Yi Yu,et al.  Performance of random forest when SNPs are in linkage disequilibrium , 2009, BMC Bioinformatics.

[33]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[34]  Junying Zhang,et al.  Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations , 2017, Scientific Reports.

[35]  Jun Zhu,et al.  Development of GMDR-GPU for Gene-Gene Interaction Analysis and Its Application to WTCCC GWAS Data for Type 2 Diabetes , 2013, PloS one.

[36]  Robert Plomin,et al.  Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence , 2017, Nature Genetics.

[37]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[38]  Romdhane Rekaya,et al.  AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm , 2010, BMC Research Notes.

[39]  M. Apostolova,et al.  An Application of Logistic Regression and Multifactor Dimensionality Reduction Analyses for Detecting Genotype-Phenotype Interactions Associated with Developing of Atherosclerosis in Bulgarian Cohort , 2012 .

[40]  Judy H. Cho,et al.  Genome-Wide Association Study Identifies African-Specific Susceptibility Loci in African Americans With Inflammatory Bowel Disease. , 2017, Gastroenterology.

[41]  Robert Karlsson,et al.  Meta-analysis of five genome-wide association studies identifies multiple new loci associated with testicular germ cell tumor , 2017, Nature Genetics.

[42]  M. Ng,et al.  SNP Selection and Classification of Genome-Wide SNP Data Using Stratified Sampling Random Forests , 2012, IEEE Transactions on NanoBioscience.

[43]  Junying Zhang,et al.  FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm , 2016, PloS one.

[44]  M. Nalls,et al.  A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci , 2017, Nature Genetics.

[45]  Tyrone D. Cannon,et al.  Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence , 2018, Nature Genetics.

[46]  J. Erdmann,et al.  A decade of genome-wide association studies for coronary artery disease: the challenges ahead , 2018, Cardiovascular research.

[47]  Qingyao Wu,et al.  Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests , 2015, BMC Genomics.

[48]  R. Kaneva,et al.  Investigation of candidate genes reveals significant statistical epistasis between DISC1 and TPH2 in Bulgarian affective disorder patients , 2017 .

[49]  Samuel E. Jones,et al.  Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry , 2018, bioRxiv.

[50]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.