The interaction index, a novel information-theoretic metric for prioritizing interacting genetic variations and environmental factors

We developed an information-theoretic metric called the Interaction Index for prioritizing genetic variations and environmental variables for follow-up in detailed sequencing studies. The Interaction Index was found to be effective for prioritizing the genetic and environmental variables involved in GEI for a diverse range of simulated data sets. The metric was also evaluated for a 103-SNP Crohn's disease dataset and a simulated data set containing 9187 SNPs and multiple covariates that was modeled on a rheumatoid arthritis data set. Our results demonstrate that the Interaction Index algorithm is effective and efficient for prioritizing interacting variables for a diverse range of epidemiologic data sets containing complex combinations of direct effects, multiple GGI and GEI.

[1]  Sinead B. O'Leary,et al.  Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease , 2001, Nature Genetics.

[2]  E. Martin,et al.  A test for linkage and association in general pedigrees: the pedigree disequilibrium test. , 2000, American journal of human genetics.

[3]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[4]  S. Fisher,et al.  Sequence variation, linkage disequilibrium and association with Crohn's disease on chromosome 5q31 , 2006, Genes and Immunity.

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[6]  An entropy-based index for fine-scale mapping of disease genes. , 2007, Journal of genetics and genomics = Yi chuan xue bao.

[7]  J. Zhang,et al.  What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. , 1998, JAMA.

[8]  P. Chanda,et al.  AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations With Complex Phenotypes , 2008, Genetics.

[9]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[10]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[11]  Momiao Xiong,et al.  An entropy-based statistic for genomewide association studies. , 2005, American journal of human genetics.

[12]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[13]  C Kooperberg,et al.  Sequence Analysis Using Logic Regression , 2001, Genetic epidemiology.

[14]  Te Sun Han,et al.  Multiple Mutual Informations and Multiple Interactions in Frequency Data , 1980, Inf. Control..

[15]  S Greenland,et al.  Basic problems in interaction assessment. , 1993, Environmental health perspectives.

[16]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .

[17]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[18]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[19]  The International HapMap Consortium,et al.  A physical map of the human genome , 2001 .

[20]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[21]  C. Wijmenga,et al.  Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. , 2006, American journal of human genetics.

[22]  Ivan Bratko,et al.  Testing the significance of attribute interactions , 2004, ICML.

[23]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[24]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[25]  Aidong Zhang,et al.  Information-theoretic metrics for visualizing gene-environment interactions. , 2007, American journal of human genetics.

[26]  M. Daly,et al.  A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms , 2001, Nature.

[27]  Thomas J. Liesegang,et al.  The sequence of the human genome. Venter JC,∗ Adams MD, Myers EW, et al. Science 2001;291:1304–1351. , 2001 .

[28]  Na Li,et al.  Genetic Analysis Workshop 15: simulation of a complex genetic model for rheumatoid arthritis in nuclear families including a dense SNP map with linkage disequilibrium between marker loci and trait loci , 2007, BMC Proceedings.

[29]  A. Zhang,et al.  Analysis of Pharmacokinetics, Pharmacodynamics, and Pharmacogenomics Data Sets Using VizStruct, a Novel Multidimensional Visualization Technique , 2004, Pharmaceutical Research.

[30]  B. Trask,et al.  A High-Resolution Radiation Hybrid Map of the Human Genome Draft Sequence , 2001, Science.

[31]  R. Culverhouse,et al.  The Use of the Restricted Partition Method with Case-Control Data , 2007, Human Heredity.

[32]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[33]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[34]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[35]  L. Beckmann,et al.  SDMinP: a program to control the family wise error rate using step-down minP adjusted P-values , 2005, Bioinform..

[36]  Momiao Xiong,et al.  An entropy-based genome-wide transmission/disequilibrium test , 2007, Human Genetics.