IGENT: efficient entropy based algorithm for genome-wide gene-gene interaction analysis

BackgroundWith the development of high-throughput genotyping and sequencing technology, there are growing evidences of association with genetic variants and complex traits. In spite of thousands of genetic variants discovered, such genetic markers have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. Gene-gene interaction (GGI) analysis is expected to unveil a large portion of unexplained heritability of complex traits.MethodsIn this work, we propose IGENT, Information theory-based GEnome-wide gene-gene iNTeraction method. IGENT is an efficient algorithm for identifying genome-wide gene-gene interactions (GGI) and gene-environment interaction (GEI). For detecting significant GGIs in genome-wide scale, it is important to reduce computational burden significantly. Our method uses information gain (IG) and evaluates its significance without resampling.ResultsThrough our simulation studies, the power of the IGENT is shown to be better than or equivalent to that of that of BOOST. The proposed method successfully detected GGI for bipolar disorder in the Wellcome Trust Case Control Consortium (WTCCC) and age-related macular degeneration (AMD).ConclusionsThe proposed method is implemented by C++ and available on Windows, Linux and MacOSX.

[1]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[2]  F. Clerget-Darpoux,et al.  Strategy for Detecting Susceptibility Genes with Weak or No Marginal Effect , 2007, Human Heredity.

[3]  Aidong Zhang,et al.  Information-theoretic metrics for visualizing gene-environment interactions. , 2007, American journal of human genetics.

[4]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[7]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[8]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[9]  R. van Winkel Family-based analysis of genetic variation underlying psychosis-inducing effects of cannabis: sibling analysis and proband follow-up. , 2011, Archives of general psychiatry.

[10]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[11]  Ku Chee Seng,et al.  The success of the genome-wide association approach: a brief story of a long struggle , 2008, European Journal of Human Genetics.

[12]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[13]  S. Djurovic,et al.  A genome-wide association study of bipolar disorder in Norwegian individuals, followed by replication in Icelandic sample. , 2010, Journal of affective disorders.

[14]  Xin Wang,et al.  SNP interaction detection with Random Forests in high-dimensional genetic data , 2012, BMC Bioinformatics.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  V. Moskvina,et al.  Genetic utility of broadly defined bipolar schizoaffective disorder as a diagnostic concept , 2009, British Journal of Psychiatry.

[17]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[18]  M. Khoury,et al.  A navigator for human genome epidemiology , 2008, Nature Genetics.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[21]  Min-Seok Kwon,et al.  A Modified Entropy-Based Approach for Identifying Gene-Gene Interactions in Case-Control Study , 2013, PloS one.

[22]  Jiang Gui,et al.  A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction , 2009, Genetic epidemiology.

[23]  Mariano Matilla-García,et al.  Open Access Methodology Article an Entropy Test for Single-locus Genetic Association Analysis , 2022 .

[24]  Marylyn D. Ritchie,et al.  Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction , 2008, BMC Bioinformatics.

[25]  H. Manji,et al.  Bipolar disorder: from genes to behavior pathways. , 2009, The Journal of clinical investigation.

[26]  Shyh-Huei Chen,et al.  A support vector machine approach for detecting gene‐gene interaction , 2008, Genetic epidemiology.

[27]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[28]  Taesung Park,et al.  New evaluation measures for multifactor dimensionality reduction classifiers in gene-gene interaction analysis , 2009, Bioinform..

[29]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[30]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[31]  F. McMahon,et al.  Genome-wide association study of suicidal ideation emerging during citalopram treatment of depressed outpatients , 2009, Pharmacogenetics and genomics.

[32]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[33]  D. Botstein,et al.  Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease , 2003, Nature Genetics.

[34]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[35]  R. Winkel Family-Based Analysis of Genetic Variation Underlying Psychosis-Inducing Effects of Cannabis: Sibling Analysis and Proband Follow-up , 2011 .

[36]  K. Iwamoto,et al.  Survey of the effect of genetic variations on gene expression in human prefrontal cortex and its application to genetics of psychiatric disorders , 2011, Neuroscience Research.