Transferring entropy to the realm of GxG interactions

Abstract Genome-wide association studies are moving to genome-wide interaction studies, as the genetic background of many diseases appears to be more complex than previously supposed. Thus, many statistical approaches have been proposed to detect gene–gene (GxG) interactions, among them numerous information theory-based methods, inspired by the concept of entropy. These are suggested as particularly powerful and, because of their nonlinearity, as better able to capture nonlinear relationships between genetic variants and/or variables. However, the introduced entropy-based estimators differ to a surprising extent in their construction and even with respect to the basic definition of interactions. Also, not every entropy-based measure for interaction is accompanied by a proper statistical test. To shed light on this, a systematic review of the literature is presented answering the following questions: (1) How are GxG interactions defined within the framework of information theory? (2) Which entropy-based test statistics are available? (3) Which underlying distribution do the test statistics follow? (4) What are the given strengths and limitations of these test statistics?

[1]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[2]  Chong Sun Hong,et al.  Mutual Information and Redundancy for Categorical Data , 2006 .

[3]  A Zhang,et al.  Modeling of environmental and genetic interactions with AMBROSIA, an information-theoretic model synthesis method , 2011, Heredity.

[4]  Aidong Zhang,et al.  Information-theoretic gene-gene and gene-environment interaction analysis of quantitative traits , 2009, BMC Genomics.

[5]  Mariza de Andrade,et al.  Statistical Applications in Genetics and Molecular Biology Entropy Based Genetic Association Tests and Gene-Gene Interaction Tests , 2011 .

[6]  Ting Hu,et al.  ViSEN: Methodology and Software for Visualization of Statistical Epistasis Networks , 2013, Genetic epidemiology.

[7]  Aidong Zhang,et al.  Information-theoretic metrics for visualizing gene-environment interactions. , 2007, American journal of human genetics.

[8]  J. Knights,et al.  An Information Theory Analysis of Gene-Environmental Interactions in Count/Rate Data , 2012, Human Heredity.

[9]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  A. Sjölander,et al.  A Critical Look at Entropy‐Based Gene‐Gene Interaction Measures , 2016, Genetic Epidemiology.

[12]  Yuanke Zhang,et al.  EpiMiner: A three-stage co-information based method for detecting and visualizing epistatic interactions , 2014, Digit. Signal Process..

[13]  Yijun Zuo,et al.  An entropy-based approach for testing genetic epistasis underlying complex diseases. , 2008, Journal of theoretical biology.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Kristel Van Steen,et al.  A roadmap to multifactor dimensionality reduction methods , 2015, Briefings Bioinform..

[16]  Min-Seok Kwon,et al.  A Modified Entropy-Based Approach for Identifying Gene-Gene Interactions in Case-Control Study , 2013, PloS one.

[17]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[18]  Pritam Chanda,et al.  Statistical Applications in Genetics and Molecular Biology Information Metrics in Genetic Epidemiology , 2011 .

[19]  Min-Seok Kwon,et al.  Detecting Genetic Interactions for Quantitative Traits Using m-Spacing Entropy Measure , 2015, BioMed research international.

[20]  Ivan Bratko,et al.  Attribute Interactions in Medical Data Analysis , 2003, AIME.

[21]  P S Albert,et al.  Limitations of the case-only design for identifying gene-environment interactions. , 2001, American journal of epidemiology.

[22]  Blaz Zupan,et al.  SNPsyn: detection and exploration of SNP–SNP interactions , 2011, Nucleic Acids Res..

[23]  Aidong Zhang,et al.  The interaction index, a novel information-theoretic metric for prioritizing interacting genetic variations and environmental factors , 2009, European Journal of Human Genetics.

[24]  Chunyu Wang,et al.  A gene-based information gain method for detecting gene–gene interactions in case–control studies , 2015, European Journal of Human Genetics.

[25]  Jayaram Raghuram,et al.  Comparative analysis of methods for detecting interacting loci , 2011, BMC Genomics.

[26]  J. Knights,et al.  SYMPHONY, an information-theoretic method for gene–gene and gene–environment interaction analysis of disease syndromes , 2013, Heredity.

[27]  Ting Hu,et al.  An information-gain approach to detecting three-way epistatic interactions in genetic association studies , 2013, J. Am. Medical Informatics Assoc..

[28]  Ivan Bratko,et al.  Analyzing Attribute Dependencies , 2003, PKDD.

[29]  E. T. Jaynes,et al.  Papers on probability, statistics and statistical physics , 1983 .

[30]  Chong Sun Hong,et al.  Mutual information and redundancy for categorical data , 2006 .

[31]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[32]  M. L. Calle,et al.  Improving strategies for detecting genetic patterns of disease susceptibility in association studies , 2008, Statistics in medicine.

[33]  Lingtao Su,et al.  Research on Single Nucleotide Polymorphisms Interaction Detection from Network Perspective , 2015, PloS one.

[34]  David J. Galas,et al.  Discovering Pair-Wise Genetic Interactions: An Information Theory-Based Approach , 2014, PloS one.

[35]  D. Anastassiou Computational analysis of the synergy among multiple interacting genes , 2007, Molecular systems biology.

[36]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[37]  P. Chanda,et al.  AMBIENCE: A Novel Approach and Efficient Algorithm for Identifying Informative Genetic and Environmental Associations With Complex Phenotypes , 2008, Genetics.

[38]  David M. Herrington,et al.  An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions , 2009, Bioinform..

[39]  C I Amos,et al.  Entropy‐based information gain approaches to detect and to characterize gene‐gene and gene‐environment interactions/correlations of complex diseases , 2011, Genetic epidemiology.

[40]  Taesung Park,et al.  IGENT: efficient entropy based algorithm for genome-wide gene-gene interaction analysis , 2014, BMC Medical Genomics.

[41]  P. Chanda,et al.  Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity , 2010, BMC Genomics.

[42]  J. V. Michalowicz,et al.  Handbook of Differential Entropy , 2013 .

[43]  Ie-Bin Lian,et al.  Summarizing techniques that combine three non-parametric scores to detect disease-associated 2-way SNP-SNP interactions. , 2014, Gene.

[44]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[45]  Zaher Dawy,et al.  An approximation to the distribution of finite sample size mutual information estimates , 2005, IEEE International Conference on Communications, 2005. ICC 2005. 2005.

[46]  Ting Hu,et al.  Epistasis analysis using information theory. , 2015, Methods in molecular biology.

[47]  Xiaoyu Zuo,et al.  To Control False Positives in Gene-Gene Interaction Analysis: Two Novel Conditional Entropy-Based Approaches , 2013, PloS one.

[48]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[49]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[50]  Ting Hu,et al.  Characterizing genetic interactions in human disease association studies using statistical epistasis networks , 2011, BMC Bioinformatics.

[51]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[52]  D. Thomas,et al.  Methods for investigating gene-environment interactions in candidate pathway and genome-wide association studies. , 2010, Annual review of public health.

[53]  Pere Caminal,et al.  MISS: a non-linear methodology based on mutual information for genetic association studies in both population and sib-pairs analysis , 2010, Bioinform..

[54]  Aleks Jakulin Machine Learning Based on Attribute Interactions , 2005 .