From Distance Correlation to Multiscale Graph Correlation

Abstract Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation (Dcorr)—a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments—to the multiscale graph correlation (MGC). By using the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence, and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to Dcorr and other popular methods. Supplementary materials for this article are available online.

[1]  R. Menezes,et al.  Entropy-Based Independence Test , 2006 .

[2]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[3]  Bruno Rémillard,et al.  Tests of Independence , 2011, International Encyclopedia of Statistical Science.

[4]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[5]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[6]  Bruno Rémillard,et al.  Local efficiency of a Cramér--von Mises test of independence , 2005 .

[7]  Ann. Probab Distance Covariance in Metric Spaces , 2017 .

[8]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[10]  P. Janssen,et al.  Theory of U-statistics , 1994 .

[11]  Carey E. Priebe,et al.  Discovering and deciphering relationships across disparate data modalities , 2016, eLife.

[12]  Maria L. Rizzo,et al.  Rejoinder: Brownian distance covariance , 2009, 1010.0844.

[13]  Mark Holmes,et al.  Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process , 2009, J. Multivar. Anal..

[14]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[17]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[18]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[19]  C. Priebe,et al.  Network dependence testing via diffusion maps and distance-based correlations , 2017, Biometrika.

[20]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[21]  P. Good Permutation, Parametric, and Bootstrap Tests of Hypotheses , 2005 .

[22]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[23]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[24]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[25]  Eric W. Bridgeford,et al.  Discovering and deciphering relationships across disparate data modalities , 2016, eLife.

[26]  Malka Gorfine,et al.  Consistent Distribution-Free $K$-Sample and Independence Tests for Univariate Random Variables , 2014, J. Mach. Learn. Res..

[27]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[28]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[29]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[30]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[31]  Zhou Zhou Measuring nonlinear dependence in time‐series, a distance correlation approach , 2012 .

[32]  Christian Genest,et al.  Asymptotic local efficiency of Cramér–von Mises tests for multivariate independence , 2005, 0708.0485.

[33]  S. Holmes,et al.  Measures of dependence between random vectors and tests of independence. Literature review , 2013, 1307.7383.