Copula Correlation: An Equitable Dependence Measure and Extension of Pearson's Correlation

In Science, Reshef et al. (2011) proposed the concept of equitability for measures of dependence between two random variables. To this end, they proposed a novel measure, the maximal information coefficient (MIC). Recently a PNAS paper (Kinney and Atwal, 2014) gave a mathematical definition for equitability. They proved that MIC in fact is not equitable, while a fundamental information theoretic measure, the mutual information (MI), is self-equitable. In this paper, we show that MI also does not correctly reflect the proportion of deterministic signals hidden in noisy data. We propose a new equitability definition based on this scenario. The copula correlation (Ccor), based on the L1-distance of copula density, is shown to be equitable under both definitions. We also prove theoretically that Ccor is much easier to estimate than MI. Numerical studies illustrate the properties of the measures.

[1]  Michael A. Newton Introducing the discussion paper by Sz\'{e}kely and Rizzo , 2010 .

[2]  Song-xi Chen,et al.  Beta kernel estimators for density functions , 1999 .

[3]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[4]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[5]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[6]  Michael Mitzenmacher,et al.  Equitability Analysis of the Maximal Information Coefficient, with Comparisons , 2013, ArXiv.

[7]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[8]  Friedrich Schmid,et al.  Mutual information as a measure of multivariate association: analytical properties and statistical estimation , 2012 .

[9]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  M. Wand,et al.  Multivariate plug-in bandwidth selection , 1994 .

[11]  Friedrich Schmid,et al.  Copula-Based Measures of Multivariate Association , 2010 .

[12]  Barnabás Póczos,et al.  Copula-based Kernel Dependency Measures , 2012, ICML.

[13]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[14]  M. C. Jones,et al.  Comparison of Smoothing Parameterizations in Bivariate Kernel Density Estimation , 1993 .

[15]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[16]  Ngai Hang Chan,et al.  NONPARAMETRIC TESTS FOR SERIAL DEPENDENCE , 1992 .

[17]  E. Maasoumi,et al.  A Dependence Metric for Possibly Nonlinear Processes , 2004 .

[18]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[19]  I. Gijbels,et al.  Improved kernel estimation of copulas: Weak convergence and goodness-of-fit testing , 2009, 0908.4530.

[20]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[21]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[22]  Mark Holmes,et al.  Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process , 2009, J. Multivar. Anal..

[23]  L. Bagnato,et al.  Testing Serial Independence via Density-Based Measures of Divergence , 2014 .

[24]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[25]  T. Vilmansen On Dependence and Discrimination in Pattern Recognition , 1972, IEEE Transactions on Computers.

[26]  Masashi Sugiyama,et al.  Mutual information approximation via maximum likelihood estimation of density ratio , 2009, 2009 IEEE International Symposium on Information Theory.

[27]  T. Speed A Correlation for the 21st Century , 2011, Science.

[28]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[29]  R. H. Farrell On the Best Obtainable Asymptotic Rates of Convergence in Estimation of a Density Function at a Point , 1972 .

[30]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[31]  Timo Koski,et al.  Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation , 2006, J. Mach. Learn. Res..

[32]  D. Donoho,et al.  Geometrizing Rates of Convergence , II , 2008 .

[33]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[34]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[35]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[36]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[37]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  A. Rényi On measures of dependence , 1959 .

[39]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[40]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[41]  A. Antos,et al.  Convergence properties of functional estimates for discrete distributions , 2001 .

[42]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Larry A. Wasserman,et al.  Exponential Concentration for Mutual Information Estimation with Application to Forests , 2012, NIPS.

[44]  D. Tj⊘stheim Measures of Dependence and Tests of Independence , 1996 .

[45]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[46]  J. Segers Asymptotics of empirical copula processes under non-restrictive smoothness assumptions , 2010, 1012.2133.

[47]  Toomas R. Vilmansen,et al.  Feature Evalution with Measures of Probabilistic Dependence , 1973, IEEE Transactions on Computers.

[48]  B. Rémillard,et al.  Test of independence and randomness based on the empirical copula process , 2004 .

[49]  P. Bickel,et al.  Nonparametric estimators which can be "plugged-in" , 2003 .

[50]  Suzana de Siqueira Santos,et al.  A comparative study of statistical methods used to identify dependencies between gene expression signals , 2014, Briefings Bioinform..

[51]  B. Schweizer,et al.  On Nonparametric Measures of Dependence for Random Variables , 1981 .

[52]  Bernstein estimator for unbounded copula densities , 2013 .

[53]  Christian Genest,et al.  Asymptotic local efficiency of Cramér–von Mises tests for multivariate independence , 2005, 0708.0485.

[54]  Gregory B. Gloor,et al.  Mutual information is critically dependent on prior assumptions: would the correct estimate of mutual information please identify itself? , 2010, Bioinform..

[55]  H. Joe Relative Entropy Measures of Multivariate Dependence , 1989 .