Distribution-free consistent independence tests via Hallin's multivariate rank

This paper investigates the problem of testing independence of two random vectors of general dimensions. For this, we give for the first time a distribution-free consistent test. Our approach combines distance covariance with a new multivariate rank statistic recently introduced by Hallin (2017). In technical terms, the proposed test is consistent and distribution-free in the family of multivariate distributions with nonvanishing (Lebesgue) probability densities. Exploiting the (degenerate) U-statistic structure of the distance covariance and the combinatorial nature of Hallin's rank, we are able to derive the limiting null distribution of our test statistic. The resulting asymptotic approximation is accurate already for moderate sample sizes and makes the test implementable without requiring permutation. The limiting distribution is derived via a more general result that gives a new type of combinatorial non-central limit theorem for double- and multiple-indexed permutation statistics.

[1]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  H. Gebelein Das statistische Problem der Korrelation als Variations‐ und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung , 1941 .

[3]  J. Wolfowitz,et al.  Statistical Tests Based on Permutations of the Observations , 1944 .

[4]  H. Wool THE RELATION BETWEEN MEASURES OF CORRELATION IN THE UNIVERSE OF SAMPLE PERMUTATIONS , 1944 .

[5]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[6]  W. Hoeffding A Combinatorial Central Limit Theorem , 1951 .

[7]  H. W. Kuhn B R Y N Mawr College Variants of the Hungarian Method for Assignment Problems' , 1955 .

[8]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[9]  On the Hoeffding’s combinatrial central limit theorem , 1956 .

[10]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[11]  A. Rényi New version of the probabilistic generalization of the large sieve , 1959 .

[12]  A. Rényi On measures of dependence , 1959 .

[13]  L. Blumenson A Derivation of n-Dimensional Spherical Coordinates , 1960 .

[14]  J. Kiefer,et al.  DISTRIBUTION FREE TESTS OF INDEPENDENCE BASED ON THE SAMPLE DISTRIBUTION FUNCTION , 1961 .

[15]  J. Hájek Some Extensions of the Wald-Wolfowitz-Noether Theorem , 1961 .

[16]  W. Rudin Principles of mathematical analysis , 1964 .

[17]  K. Atkinson THE NUMERICAL SOLUTION OF THE EIGENVALUE PROBLEM FOR COMPACT INTEGRAL OPERATORS , 2008 .

[18]  K. Jogdeo Asymptotic Normality in Nonparametric Methods , 1968 .

[19]  P. Billingsley,et al.  Convergence of Probability Measures , 1970, The Mathematical Gazette.

[20]  O. Abe A Central Limit Theorem for the Number of Edges in the Random Intersection of Two Graphs , 1969 .

[21]  T. Yanagimoto On measures of association and a related problem , 1970 .

[22]  P. Anselone,et al.  Collectively Compact Operator Approximation Theory and Applications to Integral Equations , 1971 .

[23]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[24]  A. R. Bloemena Sampling from a graph , 1976 .

[25]  R. Milner Mathematical Centre Tracts , 1976 .

[26]  M. Degroot,et al.  Probability and Statistics , 1977 .

[27]  W. Beyer CRC Handbook of Mathematical Sciences , 1978 .

[28]  L. Hubert,et al.  Asymptotic Normality of Permutation Statistics Derived from Weighted Sums of Bivariate Functions , 1979 .

[29]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[30]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[31]  E. Bolthausen An estimate of the remainder in a combinatorial central limit theorem , 1984 .

[32]  R. Farebrother The Distribution of a Positive Linear Combination of X2 Random Variables , 1984 .

[33]  A. Pietsch Eigenvalue distribution of compact operators , 1986 .

[34]  A. Barbour,et al.  Random association of symmetric arrays , 1986 .

[35]  H. König Eigenvalue Distribution of Compact Operators , 1986 .

[36]  W. Rudin Real and complex analysis, 3rd ed. , 1987 .

[37]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[38]  Robert E. Tarjan,et al.  Faster Scaling Algorithms for Network Problems , 1989, SIAM J. Comput..

[39]  D. Pham,et al.  Asymptotic normality of double-indexed linear permutation statistics , 1989 .

[40]  A. Feuerverger,et al.  A Consistent Test for Bivariate Dependence , 1993 .

[41]  F. Götze,et al.  The Rate of Convergence for Multivariate Sampling Statistics , 1993 .

[42]  R. McCann Existence and uniqueness of monotone measure-preserving maps , 1995 .

[43]  Z. Bai,et al.  Error bound in a central limit theorem of double-indexed permutation statistics , 1997 .

[44]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[45]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[46]  D. Paindaveine,et al.  Optimal tests for multivariate location based on interdirections and pseudo-Mahalanobis ranks , 2002 .

[47]  D. Paindaveine,et al.  Multivariate signed ranks: Randles' interdirections or Tyler's angles? , 2002 .

[48]  G. Székely,et al.  Extremal probabilities for Gaussian quadratic forms , 2003 .

[49]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Louis H. Y. Chen,et al.  The permutation distribution of matrix correlation statistics , 2005 .

[51]  J. P. BfflOF Computing the distribution of quadratic forms in normal variables , 2005 .

[52]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[53]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[54]  Bernhard Schölkopf,et al.  Kernel Constrained Covariance for Dependence Measurement , 2005, AISTATS.

[55]  J. Escanciano A CONSISTENT DIAGNOSTIC TEST FOR REGRESSION MODELS USING PROJECTIONS , 2006, Econometric Theory.

[56]  Maria L. Rizzo,et al.  A multivariate nonparametric test of independence , 2006 .

[57]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[58]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[59]  C. Villani Optimal Transport: Old and New , 2008 .

[60]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[61]  G. Reinert,et al.  Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition , 2007, 0711.1082.

[62]  Bruno R'emillard DISCUSSION OF: BROWNIAN DISTANCE COVARIANCE , 2009, 1010.0838.

[63]  Wicher P. Bergsma,et al.  A consistent test of independence based on a sign covariance related to Kendall's tau , 2010, 1007.4259.

[64]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[65]  Pankaj K. Agarwal,et al.  Algorithms for the transportation problem in geometric settings , 2012, SODA.

[66]  R. Heller,et al.  A class of multivariate distribution-free tests of independence based on graphs. , 2012, Journal of statistical planning and inference.

[67]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[68]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[69]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[70]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[71]  V. Chernozhukov,et al.  Monge-Kantorovich Depth, Quantiles, Ranks and Signs , 2014, 1412.8434.

[72]  Pankaj K. Agarwal,et al.  Approximation algorithms for bipartite matching with metric and geometric costs , 2014, STOC.

[73]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[74]  Real Analysis: A Comprehensive Course in Analysis, Part 1 , 2015 .

[75]  B. Simon Operator Theory: A Comprehensive Course in Analysis, Part 4 , 2015 .

[76]  Michael Mitzenmacher,et al.  Measuring Dependence Powerfully and Equitably , 2015, J. Mach. Learn. Res..

[77]  Ursula Faber,et al.  Theory Of U Statistics , 2016 .

[78]  Malka Gorfine,et al.  Consistent Distribution-Free $K$-Sample and Independence Tests for Univariate Random Variables , 2014, J. Mach. Learn. Res..

[79]  Luca Weihs,et al.  Large-Sample Theory for the Bergsma-Dassios Sign Covariance , 2016, 1602.04387.

[80]  X. Shao,et al.  Testing mutual independence in high dimension via distance covariance , 2016, 1609.09380.

[81]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[82]  M. E. Jakobsen Distance Covariance in Metric Spaces: Non-Parametric Independence Testing in Metric Spaces (Master's thesis) , 2017, 1706.03490.

[83]  Runze Li,et al.  Projection correlation between two random vectors , 2017, Biometrika.

[84]  M. Hallin On Distribution and Quantile Functions, Ranks and Signs in R_d , 2017 .

[85]  J. A. Cuesta-Albertos,et al.  Smooth Cyclically Monotone Interpolation and Empirical Center-Outward Distribution Functions , 2018 .

[86]  N. Meinshausen,et al.  Symmetric rank covariances: a generalized framework for nonparametric measures of dependence , 2017, Biometrika.

[87]  L. Wasserman,et al.  Robust Multivariate Nonparametric Tests via Projection-Pursuit , 2018, 1803.00715.

[88]  A. Figalli On the continuity of center-outward distribution and quantile functions , 2018, Nonlinear Analysis.

[89]  Li Ma,et al.  Fisher Exact Scanning for Dependency , 2016, Journal of the American Statistical Association.

[90]  David N. Reshef,et al.  An empirical study of the maximal and total information coefficients and leading measures of dependence , 2018 .

[91]  R. Lyons Errata to “Distance covariance in metric spaces” , 2018, The Annals of Probability.

[92]  E. Barrio,et al.  Smooth Cyclically Monotone Interpolation and Empirical Center-Outward Distribution Functions , 2018 .

[93]  B. Sen,et al.  Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing , 2019 .

[94]  Thomas B. Berrett,et al.  Nonparametric independence testing via mutual information , 2017, Biometrika.

[95]  Kai Zhang,et al.  BET on Independence , 2016, Journal of the American Statistical Association.

[96]  Heping Zhang,et al.  Ball Covariance: A Generic Measure of Dependence in Banach Space , 2019, Journal of the American Statistical Association.

[97]  Bodhisattva Sen,et al.  Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation , 2019, Journal of the American Statistical Association.