A new approach to detect epistasis utilizing parallel implementation of ant colony optimization by MapReduce framework

Genome-wide association studies (GWAS) involve the detection and interpretation of epistasis, which is responsible for the ‘missing heritability’ and influences common complex disease susceptibility. Many epistasis detection algorithms cannot be directly applied into GWAS as many combinations of genetic components are present in only a small amount of samples or even none at all. For a huge number of single nucleotide polymorphisms and inappropriate statistical tests, epistasis detection remains a computational and statistical challenge in genetic epidemiology. Here, we develop a novel method to identify epistatic interactions related to disease susceptibility utilizing an ant colony optimization strategy implemented by Google's MapReduce platform. We incorporate expert knowledge used to guide ants to make the best choice in the search process into the pheromone updating rule. We conduct sufficient experiments using simulated and real genome-wide data sets and experimental results demonstrate excellent performance of our algorithm compared with its competitors.

[1]  Grace Wahba,et al.  Detecting disease-causing genes by LASSO-Patternsearch algorithm , 2007, BMC proceedings.

[2]  Ed Keedwell,et al.  Ant colony optimisation to identify genetic variant association with type 2 diabetes , 2011, Inf. Sci..

[3]  Mads S. Bergholt,et al.  In vivo diagnosis of gastric cancer using Raman endoscopy and ant colony optimization techniques , 2011, International journal of cancer.

[4]  Romdhane Rekaya,et al.  AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm , 2010, BMC Research Notes.

[5]  Romdhane Rekaya,et al.  AntEpiSeeker2.0: extending epistasis detection to epistasis-associated pathway inference using ant colony optimization , 2012 .

[6]  A. Koch The pathogenesis of rheumatoid arthritis. , 2007, American journal of orthopedics.

[7]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[8]  José Ramón Quevedo,et al.  Disease Liability Prediction from Large Scale Genotyping Data Using Classifiers with a Reject Option , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Xiang Zhang,et al.  TEAM: efficient two-locus epistasis tests in human genome-wide association study , 2010, Bioinform..

[10]  D. Levy,et al.  Neurophysiologic effect of GWAS derived schizophrenia and bipolar risk variants , 2014, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[11]  Xiujuan Lei,et al.  Incorporating heuristic information into ant colony optimization for epistasis detection , 2012, Genes & Genomics.

[12]  Bill C White,et al.  Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases , 2003, BMC Bioinformatics.

[13]  P Ducimetière,et al.  Synergistic effects of angiotensin-converting enzyme and angiotensin-II type 1 receptor gene polymorphisms on risk of myocardial infarction , 1994, The Lancet.

[14]  Akira Meguro,et al.  Genome-wide association analysis identifies new susceptibility loci for Behçet's disease and epistasis between HLA-B*51 and ERAP1 , 2013, Nature Genetics.

[15]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[16]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[17]  Liang Han,et al.  CChi: An efficient cloud epistasis test model in human genome wide association studies , 2013, 2013 6th International Conference on Biomedical Engineering and Informatics.

[18]  Zhongming Zhao,et al.  Network-assisted analysis to prioritize GWAS results: principles, methods and perspectives , 2013, Human Genetics.

[19]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[20]  Marylyn D Ritchie,et al.  Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology , 2008, Genetic epidemiology.

[21]  W. Bateson,et al.  Darwin and Modern Science: Heredity and Variation in Modern Lights , 2009 .

[22]  Yupeng Wang,et al.  AntEpiSeeker 2 . 0 : extending epistasis detection to epistasis-associated pathway inference using ant colony optimization , 2012 .

[23]  Jason H. Moore,et al.  Optimal Use of Biological Expert Knowledge from Literature Mining in Ant Colony Optimization for Analysis of Epistasis in Human Disease , 2013, EvoBIO.

[24]  Beatrice Bateson,et al.  William Bateson, Naturalist: Heredity and Variation in Modern Lights , 2009 .

[25]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[26]  Dan Liu,et al.  Performance analysis of novel methods for detecting epistasis , 2011, BMC Bioinformatics.

[27]  Manuel López-Ibáñez,et al.  Ant colony optimization , 2010, GECCO '10.

[28]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[29]  Xiang Zhang,et al.  Fastanova: an efficient algorithm for genome-wide association study , 2008, KDD.