Information-theoretic metrics for visualizing gene-environment interactions.

The purpose of our work was to develop heuristics for visualizing and interpreting gene-environment interactions (GEIs) and to assess the dependence of candidate visualization metrics on biological and study-design factors. Two information-theoretic metrics, the k-way interaction information (KWII) and the total correlation information (TCI), were investigated. The effectiveness of the KWII and TCI to detect GEIs in a diverse range of simulated data sets and a Crohn disease data set was assessed. The sensitivity of the KWII and TCI spectra to biological and study-design variables was determined. Head-to-head comparisons with the relevance-chain, multifactor dimensionality reduction, and the pedigree disequilibrium test (PDT) methods were obtained. The KWII and TCI spectra, which are graphical summaries of the KWII and TCI for each subset of environmental and genotype variables, were found to detect each known GEI in the simulated data sets. The patterns in the KWII and TCI spectra were informative for factors such as case-control misassignment, locus heterogeneity, allele frequencies, and linkage disequilibrium. The KWII and TCI spectra were found to have excellent sensitivity for identifying the key disease-associated genetic variations in the Crohn disease data set. In head-to-head comparisons with the relevance-chain, multifactor dimensionality reduction, and PDT methods, the results from visual interpretation of the KWII and TCI spectra performed satisfactorily. The KWII and TCI are promising metrics for visualizing GEIs. They are capable of detecting interactions among numerous single-nucleotide polymorphisms and environmental variables for a diverse range of GEI models.

[1]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[2]  D. Haussler,et al.  A physical map of the human genome , 2001, Nature.

[3]  Genica,et al.  Commonly studied single-nucleotide polymorphisms and breast cancer: Results from the Breast Cancer Association Consortium , 2006 .

[4]  B. Trask,et al.  A High-Resolution Radiation Hybrid Map of the Human Genome Draft Sequence , 2001, Science.

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[7]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[8]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[9]  Aidong Zhang,et al.  VizStruct: exploratory visualization for gene expression profiling , 2004, Bioinform..

[10]  Shili Lin,et al.  Multilocus LD measure and tagging SNP selection with generalized mutual information , 2005, Genetic epidemiology.

[11]  W. J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[12]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[13]  Marylyn D Ritchie,et al.  Renin-Angiotensin System Gene Polymorphisms and Atrial Fibrillation , 2004, Circulation.

[14]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[15]  J. Zhang,et al.  What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. , 1998, JAMA.

[16]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[17]  J. Freudenheim,et al.  Re: Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. , 2007, Journal of the National Cancer Institute.

[18]  F. Dudbridge Pedigree disequilibrium tests for multilocus haplotypes , 2003, Genetic epidemiology.

[19]  Aidong Zhang,et al.  VizStruct for visualization of genome-wide SNP analyses , 2006, Bioinform..

[20]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[21]  Jason H. Moore,et al.  Power of multifactor dimensionality reduction for detecting gene‐gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity , 2003, Genetic epidemiology.

[22]  A. Zhang,et al.  Information-theoretic identification of predictive SNPs and supervised visualization of genome-wide association studies , 2006, Nucleic acids research.

[23]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[24]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.

[25]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[26]  Nancy J. Brown,et al.  Risk Factor Interactions and Genetic Effects Associated with Post-Operative Atrial Fibrillation , 2005, Pacific Symposium on Biocomputing.

[27]  J. H. Moore,et al.  Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus , 2004, Diabetologia.

[28]  Ivan Bratko,et al.  Attribute Interactions in Medical Data Analysis , 2003, AIME.

[29]  S. Fisher,et al.  Sequence variation, linkage disequilibrium and association with Crohn's disease on chromosome 5q31 , 2006, Genes and Immunity.

[30]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[31]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[32]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[33]  E R Martin,et al.  Identification of significant association and gene-gene interaction of GABA receptor subunit genes in autism. , 2005, American journal of human genetics.

[34]  Marylyn D. Ritchie,et al.  Parallel multifactor dimensionality reduction: a tool for the large-scale analysis of gene-gene interactions , 2006, Bioinform..