Multivariate Rank-Based Distribution-Free Nonparametric Testing Using Measure Transportation

In this paper, we propose a general framework for distribution-free nonparametric testing in multi-dimensions, based on a notion of multivariate ranks defined using the theory of measure transportation. Unlike other existing proposals in the literature, these multivariate ranks share a number of useful properties with the usual one-dimensional ranks; most importantly, these ranks are distribution-free. This crucial observation allows us to design nonparametric tests that are exactly distribution-free under the null hypothesis. We demonstrate the applicability of this approach by constructing exact distribution-free tests for two classical nonparametric problems: (i) testing for mutual independence between random vectors, and (ii) testing for the equality of multivariate distributions. In particular, we propose (multivariate) rank versions of distance covariance (Sz\'ekely et al., 2007) and energy statistic (Sz\'ekely and Rizzo, 2013) for testing scenarios (i) and (ii) respectively. In both these problems, we derive the asymptotic null distribution of the proposed test statistics. We further show that our tests are consistent against all fixed alternatives. Moreover, the proposed tests are tuning-free, computationally feasible and are well-defined under minimal assumptions on the underlying distributions (e.g., they do not need any moment assumptions). We also demonstrate the efficacy of these procedures via extensive simulations. In the process of analyzing the theoretical properties of our procedures, we end up proving some new results in the theory of measure transportation and in the limit theory of permutation statistics using Stein's method for exchangeable pairs, which may be of independent interest.

[1]  Thomas B. Berrett,et al.  Nonparametric independence testing via mutual information , 2017, Biometrika.

[2]  M. Mukaka,et al.  Statistics corner: A guide to appropriate use of correlation coefficient in medical research. , 2012, Malawi medical journal : the journal of Medical Association of Malawi.

[3]  Subir Ghosh,et al.  Multivariate analysis, design of experiments, and survey sampling , 2000 .

[4]  Sigeo Aki,et al.  On nonparametric tests for symmetry inRm , 1993 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[7]  Chih-Chou Chiu,et al.  Financial time series forecasting using independent component analysis and support vector regression , 2009, Decis. Support Syst..

[8]  Louis H. Y. Chen,et al.  On the error bound in a combinatorial central limit theorem , 2011, 1111.3159.

[9]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[10]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[11]  M. Drton,et al.  Distribution-Free Consistent Independence Tests via Center-Outward Ranks and Signs , 2019, Journal of the American Statistical Association.

[12]  Carlos Matrán,et al.  Distribution and quantile functions, ranks and signs in dimension d: A measure transportation approach , 2021, The Annals of Statistics.

[13]  Deyu Meng,et al.  FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[14]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[15]  S. Holmes,et al.  Measuring multivariate association and beyond. , 2016, Statistics surveys.

[16]  E. Hlawka Funktionen von beschränkter Variatiou in der Theorie der Gleichverteilung , 1961 .

[17]  B. Bhattacharya A general asymptotic framework for distribution‐free graph‐based two‐sample tests , 2015, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[18]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[19]  Mark Holmes,et al.  Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process , 2009, J. Multivar. Anal..

[20]  R. Rockafellar Characterization of the subdifferentials of convex functions , 1966 .

[21]  Roswitha Hofer On the distribution properties of Niederreiter–Halton sequences , 2009 .

[22]  Caren Marzban,et al.  Using labeled data to evaluate change detectors in a multivariate streaming environment , 2009, Signal Process..

[23]  F. Mosteller On Some Useful "Inefficient" Statistics , 1946 .

[24]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[25]  Marc Hallin,et al.  Rank-based optimal tests of the adequacy of an elliptic VARMA model , 2004 .

[26]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[27]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[28]  S. Rachev,et al.  Testing Multivariate Symmetry , 1995 .

[29]  Han Liu,et al.  Center-Outward R-Estimation for Semiparametric VARMA Models , 2019, Journal of the American Statistical Association.

[30]  Nicholas A. James Multiple Change Point Analysis Of Multivariate Data Via Energy Statistics , 2015 .

[31]  Herman Chernoff,et al.  ASYMPTOTIC NORMALITY AND EFFICIENCY OF CERTAIN NONPARAMETRIC TEST STATISTICS , 1958 .

[32]  N. Meinshausen,et al.  Symmetric rank covariances: a generalized framework for nonparametric measures of dependence , 2017, Biometrika.

[33]  R. Heller,et al.  A class of multivariate distribution-free tests of independence based on graphs. , 2012, Journal of statistical planning and inference.

[34]  P. Gruber,et al.  Funktionen von beschränkter Variation in der Theorie der Gleichverteilung , 1990 .

[35]  B. Sen,et al.  Inconsistency of bootstrap: The Grenander estimator , 2010, 1010.3825.

[36]  A Class of Probability Metrics and its Statistical Applications , 2002 .

[37]  M. Drton,et al.  Distribution-free consistent independence tests via Hallin's multivariate rank , 2019, 1909.10024.

[38]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[39]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[40]  Hannu Oja,et al.  Multivariate spatial sign and rank methods , 1995 .

[41]  D. Cruz-Uribe,et al.  SHARP ERROR BOUNDS FOR THE TRAPEZOIDAL RULE AND SIMPSON'S RULE , 2002 .

[42]  M. Drton,et al.  High dimensional independence testing with maxima of rank correlations , 2018 .

[43]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[44]  Divyansh Agarwal,et al.  Distribution-Free Multisample Test Based on Optimal Matching with Applications to Single Cell Genomics , 2019, 1906.04776.

[45]  A. Figalli,et al.  W2,1 regularity for solutions of the Monge–Ampère equation , 2013 .

[46]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[47]  Hannu Oja,et al.  AFFINE INVARIANT MULTIVARIATE RANK TESTS FOR SEVERAL SAMPLES , 1998 .

[48]  David N. Reshef,et al.  An empirical study of the maximal and total information coefficients and leading measures of dependence , 2018 .

[49]  P. Bickel A Distribution Free Version of the Smirnov Two Sample Test in the $p$-Variate Case , 1969 .

[50]  L. Caffarelli Interior $W^{2,p}$ estimates for solutions of the Monge-Ampère equation , 1990 .

[51]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[52]  S. Geer Applications of empirical process theory , 2000 .

[53]  Olivier Capp'e,et al.  Homogeneity and change-point detection tests for multivariate data using rank statistics , 2011, 1107.1971.

[54]  Knut Conradsen,et al.  A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data , 2003, IEEE Trans. Geosci. Remote. Sens..

[55]  Hannu Oja,et al.  ON THE EFFICIENCY OF MULTIVARIATE SPATIAL SIGN AND RANK TESTS , 1997 .

[56]  Lev Klebanov,et al.  Multivariate search for differentially expressed gene combinations , 2004, BMC Bioinformatics.

[57]  H. Oja Multivariate Nonparametric Methods with R , 2010 .

[58]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[59]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[60]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[61]  M. Lavielle,et al.  Detection of multiple change-points in multivariate time series , 2006 .

[62]  B. Sen,et al.  Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit Testing , 2019 .

[63]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[64]  Diane J. Cook,et al.  A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.

[65]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[66]  Lauwerens Kuipers,et al.  Uniform distribution of sequences , 1974 .

[67]  A. Feuerverger,et al.  A Consistent Test for Bivariate Dependence , 1993 .

[68]  H. Oja,et al.  Sign test of independence between two random vectors , 2003 .

[69]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[70]  Statistical comparison of the geometry of second-phase particles , 2009 .

[71]  Ruth Heller,et al.  Multivariate tests of association based on univariate tests , 2016, NIPS.

[72]  Tamás F. Móri,et al.  Four simple axioms of dependence measures , 2018, Metrika.

[73]  J. Kiefer,et al.  DISTRIBUTION FREE TESTS OF INDEPENDENCE BASED ON THE SAMPLE DISTRIBUTION FUNCTION , 1961 .

[74]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[75]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[76]  S. Kotz,et al.  Correlation and dependence , 2001 .

[77]  C. Villani Topics in Optimal Transportation , 2003 .

[78]  R. McCann Existence and uniqueness of monotone measure-preserving maps , 1995 .

[79]  J. Marden,et al.  An Approach to Multivariate Rank Tests in Multivariate Analysis of Variance , 1997 .

[80]  Jan Hauke,et al.  Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[81]  J. L. Hodges,et al.  The Efficiency of Some Nonparametric Competitors of the t-Test , 1956 .

[82]  H. Oja Multivariate Nonparametric Methods with R: An approach based on spatial signs and ranks , 2010 .

[83]  Frank Wilcoxon,et al.  Probability tables for individual comparisons by ranking methods. , 1947 .

[84]  Thomas Mayer,et al.  Selecting Economic Hypotheses by Goodness of Fit , 1975 .

[85]  Jean-Philippe Vert,et al.  The group fused Lasso for multiple change-point detection , 2011, 1106.4199.

[86]  Lionel Weiss,et al.  Two-Sample Tests for Multivariate Distributions , 1960 .

[87]  Nils Blomqvist,et al.  On a Measure of Dependence Between two Random Variables , 1950 .

[88]  T. W. Anderson On the Distribution of the Two-Sample Cramer-von Mises Criterion , 1962 .

[89]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[90]  Maria L. Rizzo,et al.  New Goodness-of-Fit Tests for Pareto Distributions , 2009 .

[91]  R. Iman,et al.  A distribution-free approach to inducing rank correlation among input variables , 1982 .

[92]  Michael A. Newton Introducing the discussion paper by Sz\'{e}kely and Rizzo , 2010 .

[93]  P. Visscher,et al.  A versatile gene-based test for genome-wide association studies. , 2010, American journal of human genetics.

[94]  Rudolf Beran,et al.  Testing for Ellipsoidal Symmetry of a Multivariate Density , 1979 .

[95]  Ery Arias-Castro,et al.  On the Consistency of the Crossmatch Test , 2015, 1509.05790.

[96]  Bodhisattva Sen,et al.  Testing independence and goodness-of-fit in linear models , 2013 .

[97]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[98]  R. Caflisch,et al.  Quasi-Monte Carlo integration , 1995 .

[99]  Adam Petrie,et al.  Graph-theoretic multisample tests of equality in distribution for high dimensional data , 2016, Comput. Stat. Data Anal..

[100]  Bharath K. Sriperumbudur,et al.  Discussion of: Brownian distance covariance , 2009, 1010.0836.

[101]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[102]  Xinyi Xu,et al.  Optimal Nonbipartite Matching and Its Statistical Applications , 2011, The American statistician.

[103]  S. Csörgo Testing for independence by the empirical characteristic function , 1985 .

[104]  R. Serfling Multivariate Symmetry and Asymmetry , 2006 .

[105]  William Kruskal,et al.  A Nonparametric test for the Several Sample Problem , 1952 .

[106]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[107]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[108]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[109]  Dylan S. Small,et al.  Using the Cross-Match Test to Appraise Covariate Balance in Matched Pairs , 2010 .

[110]  Michael Mitzenmacher,et al.  Measuring Dependence Powerfully and Equitably , 2015, J. Mach. Learn. Res..

[111]  R. Randles,et al.  Multivariate Nonparametric Tests of Independence , 2005 .

[112]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[113]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[114]  Gerhard Larcher,et al.  On existence and discrepancy of certain digital Niederreiter-Halton sequences , 2010 .

[115]  Anil K. Ghosh,et al.  A distribution-free two-sample run test applicable to high-dimensional data , 2014 .

[116]  V. Spokoiny,et al.  Multivariate Brenier cumulative distribution functions and their application to non-parametric testing , 2018, 1809.04090.

[117]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[118]  Jerome H. Friedman,et al.  A New Graph-Based Two-Sample Test for Multivariate and Object Data , 2013, 1307.6294.

[119]  M. Hallin On Distribution and Quantile Functions, Ranks and Signs in R_d , 2017 .

[120]  Guangyuan Yang The Energy Goodness-of-fit Test for Univariate Stable Distributions , 2012 .

[121]  Marc Hallin,et al.  Parametric and semiparametric inference for shape: the role of the scale functional , 2006 .

[122]  Rebecca A. Betensky,et al.  Testing Quasi-Independence of Failure and Truncation Times via Conditional Kendall's Tau , 2005 .

[123]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[124]  Ronald H. Randles,et al.  Multivariate rank tests for the two-sample location problem , 1990 .

[125]  K. Pearson NOTES ON THE HISTORY OF CORRELATION , 1920 .

[126]  M. Rosenblatt A Quadratic Measure of Deviation of Two-Dimensional Density Estimates and A Test of Independence , 1975 .

[127]  Valentin Rousson,et al.  On Distribution-Free Tests for the Multivariate Two-Sample Location-Scale Model , 2002 .

[128]  D. Bertsekas The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .

[129]  M. Drton,et al.  Rate-Optimality of Consistent Distribution-Free Tests of Independence Based on Center-Outward Ranks and Signs , 2020 .

[130]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[131]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[132]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[133]  A. Rényi On measures of dependence , 1959 .

[134]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[135]  Ann. Probab Distance Covariance in Metric Spaces , 2017 .

[136]  T. Dishion,et al.  Middle Childhood Antecedents to Progressions in Male Adolescent Substance Use , 1999 .

[137]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[138]  Peter J. Bickel,et al.  On Some Asymptotically Nonparametric Competitors of Hotelling's $T^{2 1}$ , 1965 .

[139]  Y. Brenier Polar Factorization and Monotone Rearrangement of Vector-Valued Functions , 1991 .

[140]  D. Schopflocher,et al.  Between intention and behavior: an application of community pharmacists' assessment of pharmaceutical care. , 1999, Social science & medicine.

[141]  Hannu Oja,et al.  Multivariate Nonparametric Tests , 2004 .

[142]  D. Hawkins Fitting multiple change-point models to data , 2001 .

[143]  J. Stoer,et al.  Introduction to Numerical Analysis , 2002 .

[144]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[145]  John L. Graham,et al.  A Field Study of Causal Inferences and Consumer Reaction: The View from the Airport , 1987 .

[146]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[147]  Pranab Kumar Sen,et al.  On a Class of Multivariate Multisample Rank Order Tests II: Tests for Homogeneity of Dispersion Matrices , 2015 .

[148]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[149]  Victor M. Panaretos,et al.  Fréchet means and Procrustes analysis in Wasserstein space , 2017, Bernoulli.

[150]  Bruno R'emillard DISCUSSION OF: BROWNIAN DISTANCE COVARIANCE , 2009, 1010.0838.

[151]  H. Kuo Gaussian Measures in Banach Spaces , 1975 .

[152]  Arthur Gretton,et al.  Nonparametric Independence Tests: Space Partitioning and Kernel Approaches , 2008, ALT.

[153]  Anil K. Ghosh,et al.  On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions , 2016 .

[154]  K. Pillai,et al.  Power comparisons of tests of two multivariate hypotheses based on four criteria. , 1967, Biometrika.

[155]  P. Sen,et al.  Theory of rank tests , 1969 .

[156]  Regina Y. Liu,et al.  A Quality Index Based on Data Depth and Multivariate Rank Tests , 1993 .

[157]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[158]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[159]  Q. Shao,et al.  Stein's Method of Exchangeable Pairs with Application to the Curie-Weiss Model , 2009, 0907.4450.

[160]  P. Rabinowitz The convergence of interpolatory product integration rules , 1986 .

[161]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[162]  Maria L. Rizzo,et al.  A multivariate nonparametric test of independence , 2006 .

[163]  R. Serfling,et al.  General notions of statistical depth function , 2000 .

[164]  Dylan S. Small,et al.  Sensitivity Analysis for the Cross-Match Test, With Applications in Genomics , 2010 .

[165]  William R. Dillon,et al.  A Probabilistic Model For Testing Hypothesized Hierarchical Market Structures , 1985 .

[166]  R. Randles,et al.  A Nonparametric Test of Independence between Two Vectors , 1997 .

[167]  A. Figalli,et al.  $W^{2,1}$ regularity for solutions of the Monge-Amp\`ere equation , 2011, 1111.7207.

[168]  David S. Matteson,et al.  Independent Component Analysis via Distance Covariance , 2013, 1306.4911.

[169]  Jean-François Quessy,et al.  Applications and asymptotic power of marginal-free tests of stochastic vectorial independence , 2010 .

[170]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[171]  P. Chaudhuri On a geometric notion of quantiles for multivariate data , 1996 .

[172]  V. Chernozhukov,et al.  Monge-Kantorovich Depth, Quantiles, Ranks and Signs , 2014, 1412.8434.

[173]  Wicher P. Bergsma,et al.  A consistent test of independence based on a sign covariance related to Kendall's tau , 2010, 1007.4259.