Equitability, mutual information, and the maximal information coefficient

Significance Attention has recently focused on a basic yet unresolved problem in statistics: How can one quantify the strength of a statistical association between two variables without bias for relationships of a specific form? Here we propose a way of mathematically formalizing this “equitability” criterion, using core concepts from information theory. This criterion is naturally satisfied by a fundamental information-theoretic measure of dependence called “mutual information.” By contrast, a recently introduced dependence measure called the “maximal information coefficient” is seen to violate equitability. We conclude that estimating mutual information provides a natural and practical method for equitably quantifying associations in large datasets. How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical “equitability” has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518–1524], which proposed an alternative definition of equitability and introduced a new statistic, the “maximal information coefficient” (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

[1]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[4]  Ga Miller,et al.  Note on the bias of information estimates , 1955 .

[5]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[6]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[7]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  David R. Wolf,et al.  Estimating functions of probability distributions from a finite set of samples. , 1994, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[10]  Stefano Panzeri,et al.  The Upward Bias in Measures of Information Derived from Limited Data Samples , 1995, Neural Computation.

[11]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[13]  Peter Grassberger,et al.  Entropy estimation of symbol sequences. , 1996, Chaos.

[14]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[15]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[16]  William Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[17]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[18]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[19]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[20]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[21]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  A. Chao,et al.  Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample , 2004, Environmental and Ecological Statistics.

[23]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[24]  William Bialek,et al.  Analyzing Neural Responses to Natural Signals: Maximally Informative Dimensions , 2002, Neural Computation.

[25]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Thomas Schürmann Bias analysis in entropy estimation , 2004 .

[27]  William Bialek,et al.  Estimating mutual information and multi-information in large networks , 2005, ArXiv.

[28]  P. Rapp,et al.  Statistical validation of mutual information calculations: comparison of alternative numerical algorithms. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Kenneth D. Miller,et al.  Adaptive filtering enhances information transmission in visual cortex , 2006, Nature.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[32]  J. Kinney,et al.  Precise physical models of protein–DNA interaction from high-throughput data , 2007, Proceedings of the National Academy of Sciences.

[33]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  N. Slonim,et al.  A universal framework for regulatory element discovery across all genomes and data types. , 2007, Molecular cell.

[35]  Stefano Panzeri,et al.  Correcting for the sampling bias problem in spike train information measures. , 2007, Journal of neurophysiology.

[36]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[37]  T. Sharpee,et al.  Estimating linear–nonlinear models using Rényi divergences , 2009, Network.

[38]  Yan Li,et al.  Estimation of Mutual Information: A Survey , 2009, RSKT.

[39]  Korbinian Strimmer,et al.  Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks , 2008, J. Mach. Learn. Res..

[40]  J. Kinney,et al.  Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence , 2010, Proceedings of the National Academy of Sciences.

[41]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[42]  Weiguo Liu,et al.  Parallel mutual information estimation for inferring gene regulatory networks on GPUs , 2011, BMC Research Notes.

[43]  T. Speed A Correlation for the 21st Century , 2011, Science.

[44]  J. E. García,et al.  A non-parametric test of independence ∗ , 2011 .

[45]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[46]  Gurinder S. Atwal,et al.  Maximally informative models and diffeomorphic modes in the analysis of large data sets , 2012 .

[47]  Finding correlations in big data , 2012, Nature Biotechnology.

[48]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[49]  M. Vinck,et al.  Estimation of the entropy based on its polynomial representation. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Gabriele Varani,et al.  Faculty Opinions recommendation of Systematic discovery of structural elements governing stability of mammalian messenger RNAs. , 2012 .

[51]  Michael Mitzenmacher,et al.  Equitability Analysis of the Maximal Information Coefficient, with Comparisons , 2013, ArXiv.

[52]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[53]  Cesare Furlanello,et al.  minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers , 2012, Bioinform..

[54]  Justin B. Kinney,et al.  Parametric inference in the large data limit using maximally informative models , 2012, bioRxiv.

[55]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.