To Control False Positives in Gene-Gene Interaction Analysis: Two Novel Conditional Entropy-Based Approaches

Genome-wide analysis of gene-gene interactions has been recognized as a powerful avenue to identify the missing genetic components that can not be detected by using current single-point association analysis. Recently, several model-free methods (e.g. the commonly used information based metrics and several logistic regression-based metrics) were developed for detecting non-linear dependence between genetic loci, but they are potentially at the risk of inflated false positive error, in particular when the main effects at one or both loci are salient. In this study, we proposed two conditional entropy-based metrics to challenge this limitation. Extensive simulations demonstrated that the two proposed metrics, provided the disease is rare, could maintain consistently correct false positive rate. In the scenarios for a common disease, our proposed metrics achieved better or comparable control of false positive error, compared to four previously proposed model-free metrics. In terms of power, our methods outperformed several competing metrics in a range of common disease models. Furthermore, in real data analyses, both metrics succeeded in detecting interactions and were competitive with the originally reported results or the logistic regression approaches. In conclusion, the proposed conditional entropy-based metrics are promising as alternatives to current model-based approaches for detecting genuine epistatic effects.

[1]  M. Xiong,et al.  Test for interaction between two unlinked loci. , 2006, American journal of human genetics.

[2]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[3]  J. Witte Genome-wide association studies and beyond. , 2010, Annual review of public health.

[4]  Debbie S. Yuster,et al.  A complete classification of epistatic two-locus models , 2006, BMC Genetics.

[5]  Mario Recker,et al.  Negative epistasis between the malaria-protective effects of α+-thalassemia and the sickle cell trait , 2005, Nature Genetics.

[6]  Sungho Won,et al.  Single‐marker and two‐marker association tests for unphased case‐control genotype data, with a power comparison , 2009, Genetic epidemiology.

[7]  Dirk Hoyer,et al.  Mutual information and phase dependencies: measures of reduced nonlinear cardiorespiratory interactions after myocardial infarction. , 2002, Medical engineering & physics.

[8]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[9]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  R. Elston,et al.  The Meaning of Interaction , 2010, Human Heredity.

[11]  P. Cheng,et al.  Likelihood Ratio Tests With Three-Way Tables , 2010 .

[12]  Yun Xiao,et al.  A systematic method for mapping multiple loci: an application to construct a genetic network for rheumatoid arthritis. , 2008, Gene.

[13]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[14]  Tyler J. VanderWeele,et al.  Empirical tests for compositional epistasis , 2010, Nature Reviews Genetics.

[15]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[16]  David R. Brillinger,et al.  Some data analyses using mutual information , 2004 .

[17]  Miranda Thomas,et al.  Two Polymorphic Variants of Wild-Type p53 Differ Biochemically and Biologically , 1999, Molecular and Cellular Biology.

[18]  D. Anastassiou Computational analysis of the synergy among multiple interacting genes , 2007, Molecular systems biology.

[19]  Jason H Moore,et al.  Computational analysis of gene-gene interactions using multifactor dimensionality reduction , 2004, Expert review of molecular diagnostics.

[20]  Xia Li,et al.  Gene mining: a novel and powerful ensemble decision approach to hunting for disease genes using microarray expression profiling. , 2004, Nucleic acids research.

[21]  Momiao Xiong,et al.  An entropy-based statistic for genomewide association studies. , 2005, American journal of human genetics.

[22]  John A. D. Aston,et al.  Linear Information Models: An Introduction , 2007, Journal of Data Science.

[23]  John P A Ioannidis,et al.  Beyond genome-wide association studies: genetic heterogeneity and individual predisposition to cancer. , 2010, Trends in genetics : TIG.

[24]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[25]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[26]  Fan Zhang,et al.  A Novel Evolution-Based Method for Detecting Gene-Gene Interactions , 2011, PloS one.

[27]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[28]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[29]  S Greenland,et al.  Basic problems in interaction assessment. , 1993, Environmental health perspectives.

[30]  Momiao Xiong,et al.  A Novel Statistic for Genome-Wide Interaction Analysis , 2010, PLoS genetics.

[31]  S. Kingsmore,et al.  Genome-Wide Association Studies: Progress in Identifying Genetic Biomarkers in Common, Complex Diseases , 2007, Biomarker Insights.

[32]  D. Thomas,et al.  Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. , 2010, Annual review of public health.

[33]  A. Levine,et al.  A Single Nucleotide Polymorphism in the MDM2 Promoter Attenuates the p53 Tumor Suppressor Pathway and Accelerates Tumor Formation in Humans , 2004, Cell.

[34]  David J. Hunter,et al.  The p53 Arg72Pro and MDM2 -309 polymorphisms and risk of breast cancer in the nurses’ health studies , 2006, Cancer Causes & Control.

[35]  Momiao Xiong,et al.  Mutual Information for Testing Gene-Environment Interaction , 2009, PloS one.

[36]  Masao Ueki,et al.  Improved Statistics for Genome-Wide Interaction Analysis , 2012, PLoS genetics.

[37]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[38]  Wen Tan,et al.  Genetic polymorphisms in cell cycle regulatory genes MDM2 and TP53 are associated with susceptibility to lung cancer , 2006, Human mutation.

[39]  M. Xiong,et al.  Composite measure of linkage disequilibrium for testing interaction between unlinked loci , 2008, European Journal of Human Genetics.

[40]  H. Bussey,et al.  Exploring genetic interactions and networks with yeast , 2007, Nature Reviews Genetics.

[41]  Jun Yong Park,et al.  MDM2 and p53 polymorphisms are associated with the development of hepatocellular carcinoma in patients with chronic hepatitis B virus infection. , 2008, Carcinogenesis.

[42]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[43]  D. Thomas,et al.  Gene–environment-wide association studies: emerging approaches , 2010, Nature Reviews Genetics.

[44]  Marylyn D. Ritchie,et al.  Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA , 2008, EvoBIO.

[45]  Wen Tan,et al.  Interaction of P53 Arg72Pro and MDM2 T309G polymorphisms and their associations with risk of gastric cardia cancer. , 2007, Carcinogenesis.

[46]  Aidong Zhang,et al.  The interaction index, a novel information-theoretic metric for prioritizing interacting genetic variations and environmental factors , 2009, European Journal of Human Genetics.

[47]  Wojciech Szpankowski,et al.  Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates , 2007, EURASIP J. Bioinform. Syst. Biol..

[48]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .