A novel information theoretic method for detecting gene-gene and gene-environment interactions in complex diseases

Gene-gene and gene-environment interactions play important roles in the etiology of complex multi-factorial diseases. With the advancements in genotyping technology, large genetic association studies based on hundreds of thousands of single-nucleotide polymorphisms are a popular option for the study of complex diseases. In this paper we use information theoretic concepts to develop a novel method for detecting statistical gene-gene and gene-environment interactions in complex disease models. We explore the effectiveness of our method with extensive simulations using different gene-gene interaction models and the rheumatoid arthritis dataset from genetic analysis workshop-15. The performance of the method was compared to the well known multi-factor dimensionality reduction (MDR) and generalized MDR (GMDR) methods. We demonstrate that our method is capable of analyzing a diverse range of epidemiological data sets containing evidences for gene-gene interactions.

[1]  Shili Lin,et al.  Multilocus LD measure and tagging SNP selection with generalized mutual information , 2005, Genetic epidemiology.

[2]  Marylyn D. Ritchie,et al.  Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions , 2006, Bioinform..

[3]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[4]  Wentian Li,et al.  A Complete Enumeration and Classification of Two-Locus Disease Models , 1999, Human Heredity.

[5]  Aidong Zhang,et al.  VizStruct for visualization of genome-wide SNP analyses , 2006, Bioinform..

[6]  Jun Zhu,et al.  A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. , 2007, American journal of human genetics.

[7]  Momiao Xiong,et al.  An entropy-based statistic for genomewide association studies. , 2005, American journal of human genetics.

[8]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[9]  T. Reich,et al.  A perspective on epistasis: limits of models displaying no main effect. , 2002, American journal of human genetics.

[10]  Na Li,et al.  Genetic Analysis Workshop 15: simulation of a complex genetic model for rheumatoid arthritis in nuclear families including a dense SNP map with linkage disequilibrium between marker loci and trait loci , 2007, BMC Proceedings.

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[13]  J. Ott,et al.  Mathematical multi-locus approaches to localizing complex human trait genes , 2003, Nature Reviews Genetics.

[14]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[15]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[16]  Te Sun Han,et al.  Multiple Mutual Informations and Multiple Interactions in Frequency Data , 1980, Inf. Control..

[17]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[18]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[19]  E. Thompson,et al.  A model-based method for identifying species hybrids using multilocus genetic data. , 2002, Genetics.

[20]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[21]  Aidong Zhang,et al.  Information-theoretic metrics for visualizing gene-environment interactions. , 2007, American journal of human genetics.