An information-gain approach to detecting three-way epistatic interactions in genetic association studies

Background Epistasis has been historically used to describe the phenomenon that the effect of a given gene on a phenotype can be dependent on one or more other genes, and is an essential element for understanding the association between genetic and phenotypic variations. Quantifying epistasis of orders higher than two is very challenging due to both the computational complexity of enumerating all possible combinations in genome-wide data and the lack of efficient and effective methodologies. Objectives In this study, we propose a fast, non-parametric, and model-free measure for three-way epistasis. Methods Such a measure is based on information gain, and is able to separate all lower order effects from pure three-way epistasis. Results Our method was verified on synthetic data and applied to real data from a candidate-gene study of tuberculosis in a West African population. In the tuberculosis data, we found a statistically significant pure three-way epistatic interaction effect that was stronger than any lower-order associations. Conclusion Our study provides a methodological basis for detecting and characterizing high-order gene-gene interactions in genetic association studies.

[1]  C I Amos,et al.  Entropy‐based information gain approaches to detect and to characterize gene‐gene and gene‐environment interactions/correlations of complex diseases , 2011, Genetic epidemiology.

[2]  Eric Boerwinkle,et al.  Determinants of the success of whole-genome association testing. , 2005, Genome research.

[3]  B. McKinney,et al.  Capturing the Spectrum of Interaction Effects in Genetic Association Studies by Simulated Evaporative Cooling Network Analysis , 2009, PLoS genetics.

[4]  Jason H. Moore,et al.  The Ubiquitous Nature of Epistasis in Determining Susceptibility to Common Human Diseases , 2003, Human Heredity.

[5]  Scott M. Williams,et al.  DC-SIGN (CD209), pentraxin 3 and vitamin D receptor gene variants associate with pulmonary tuberculosis risk in West Africans , 2007, Genes and Immunity.

[6]  Jason H. Moore,et al.  Pacific Symposium on Biocomputing 15:327-336(2010) ENABLING PERSONAL GENOMICS WITH AN EXPLICIT TEST OF EPISTASIS , 2022 .

[7]  D. Anastassiou Computational analysis of the synergy among multiple interacting genes , 2007, Molecular systems biology.

[8]  Jean-Louis Herrmann,et al.  DC-SIGN Induction in Alveolar Macrophages Defines Privileged Target Host Cells for Mycobacteria in Patients with Tuberculosis , 2005, PLoS medicine.

[9]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[10]  Jiang Gui,et al.  Symbolic Modeling of Epistasis , 2007, Human Heredity.

[11]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[12]  Dimitris Anastassiou,et al.  Inference of Disease-Related Molecular Logic from Systems-Based Microarray Analysis , 2006, PLoS Comput. Biol..

[13]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[14]  Scott M. Williams,et al.  challenges for genome-wide association studies , 2010 .

[15]  Y. Kooyk,et al.  C-Type Lectin DC-SIGN Modulates Toll-like Receptor Signaling via Raf-1 Kinase-Dependent Acetylation of Transcription Factor NF-κB , 2007 .

[16]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[17]  Jörg Fliege,et al.  Machine learning approaches for the discovery of gene-gene interactions in disease data , 2013, Briefings Bioinform..

[18]  Giorgio Sirugo,et al.  Vitamin D receptor polymorphisms and susceptibility to tuberculosis in West Africa: a case-control and family study. , 2004, The Journal of infectious diseases.

[19]  P. Chanda,et al.  AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations With Complex Phenotypes , 2008, Genetics.

[20]  David M. Miller,et al.  Computational inference of the molecular logic for synaptic connectivity in C. elegans , 2006, ISMB.

[21]  M. Arditi,et al.  TB, or not TB: that is the question -- does TLR signaling hold the answer? , 2004, The Journal of clinical investigation.

[22]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[23]  Jason H. Moore,et al.  Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection , 2012, BioData Mining.

[24]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[25]  K. Bussell Signalling: Friendly rivalry , 2005, Nature Reviews Molecular Cell Biology.

[26]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[27]  Gal Chechik,et al.  Group Redundancy Measures Reveal Redundancy Reduction in the Auditory Pathway , 2001, NIPS.

[28]  Aidong Zhang,et al.  The interaction index, a novel information-theoretic metric for prioritizing interacting genetic variations and environmental factors , 2009, European Journal of Human Genetics.

[29]  Chris S. Haley,et al.  Epistasis: too often neglected in complex trait studies? , 2004, Nature Reviews Genetics.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Scott M. Williams,et al.  Epistasis and its implications for personal genetics. , 2009, American journal of human genetics.

[32]  Alberto Mantovani,et al.  IFN-γ-inducible protein 10 and pentraxin 3 plasma levels are tools for monitoring inflammation and disease activity in Mycobacterium tuberculosis infection , 2005 .

[33]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[34]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[35]  Christian Wejse,et al.  Tuberculosis burden in an urban population: a cross sectional tuberculosis survey from Guinea Bissau , 2010, BMC infectious diseases.

[36]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[37]  Lluis Quintana-Murci,et al.  Promoter Variation in the DC-SIGN–Encoding Gene CD209 Is Associated with Tuberculosis , 2006, PLoS medicine.

[38]  R. Schumann,et al.  Single nucleotide polymorphisms of Toll-like receptors and susceptibility to infectious disease. , 2005, The Lancet. Infectious diseases.

[39]  Barbara Bottazzi,et al.  Pentraxins in Innate Immunity: From C-Reactive Protein to the Long Pentraxin PTX3 , 2007, Journal of Clinical Immunology.

[40]  Ting Hu,et al.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks , 2011, BMC Bioinformatics.

[41]  Scott M. Williams,et al.  Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. , 2005, BioEssays : news and reviews in molecular, cellular and developmental biology.

[42]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[43]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[44]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.