Equitability Analysis of the Maximal Information Coefficient, with Comparisons

A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statistic, such as the maximal information coefficient (MIC), can be useful for analyzing high-dimensional data sets. Here, we explore both equitability and the properties of MIC, and discuss several aspects of the theory and practice of MIC. We begin by presenting an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition. We then examine the speed and optimality of the approximation algorithm used to compute MIC, and suggest some directions for improving both. Finally, we demonstrate in a range of noise models and sample sizes that MIC is more equitable than natural alternatives, such as mutual information estimation and distance correlation.

[1]  T. Speed A Correlation for the 21st Century , 2011, Science.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[4]  Yong Wang,et al.  Clinical data analysis reveals three subytpes of gastric cancer , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[5]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Cesare Furlanello,et al.  cmine, minerva & minepy: a C engine for the MINE suite and its R and Python wrappers , 2012 .

[7]  Marc R. Wilkins,et al.  A multidimensional matrix for systems biology research and its application to interaction networks. , 2012, Journal of proteome research.

[8]  Tavis K. Anderson,et al.  Ranking viruses: measures of positional importance within networks define core viruses for rational polyvalent vaccine development , 2012, Bioinform..

[9]  Haiyuan Yu,et al.  Genome-scale analysis of interaction dynamics reveals organization of biological networks , 2012, Bioinform..

[10]  Cesare Furlanello,et al.  minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers , 2012, Bioinform..

[11]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[12]  L. T. Angenent,et al.  Host Remodeling of the Gut Microbiome and Metabolic Changes during Pregnancy , 2012, Cell.

[13]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[14]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[15]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[16]  Thomas Blaschke,et al.  Ubiquitous Geo-Sensing for Context-Aware Analysis: Exploring Relationships between Environmental and Human Dynamics , 2012, Sensors.

[17]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..