Remember the curse of dimensionality: the case of goodness-of-fit testing in arbitrary dimension

ABSTRACT Despite a substantial literature on nonparametric two-sample goodness-of-fit testing in arbitrary dimensions, there is no mention there of any curse of dimensionality. In fact, in some publications, a parametric rate is derived. As we discuss below, this is because a directional alternative is considered. Indeed, even in dimension one, Ingster, Y. I. [(1987). Minimax testing of nonparametric hypotheses on a distribution density in the l_p metrics. Theory of Probability & Its Applications, 31(2), 333–337] has shown that the minimax rate is not parametric. In this paper, we extend his results to arbitrary dimension and confirm that the minimax rate is not only nonparametric, exhibits but also a prototypical curse of dimensionality. We further extend Ingster's work to show that the chi-squared test achieves the minimax rate. Moreover, we show that the test adapts to the intrinsic dimensionality of the data. Finally, in the spirit of Ingster, Y. I. [(2000). Adaptive chi-square tests. Journal of Mathematical Sciences, 99(2), 1110–1119], we consider a multiscale version of the chi-square test, showing that one can adapt to unknown smoothness without much loss in power.

[1]  Istituto italiano degli attuari Giornale dell'Istituto italiano degli attuari , 1930 .

[2]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[3]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[4]  Frederick Mosteller,et al.  Note on an Application of Runs to Quality Control Charts , 1941 .

[5]  H. Hotelling A Generalized T Test and Measure of Multivariate Dispersion , 1951 .

[6]  P. Bickel A Distribution Free Version of the Smirnov Two Sample Test in the $p$-Variate Case , 1969 .

[7]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[8]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[9]  Yu. I. Ingster Minimax Testing of Nonparametric Hypotheses on a Distribution Density in the $L_p$ Metrics , 1987 .

[10]  L. Klebanov,et al.  A characterization of distributions by mean values of statistics and certain probabilistic metrics , 1992 .

[11]  M. Faddy,et al.  Likelihood Computations for Extended Poisson Process Models , 1999 .

[12]  Yu. I. Ingster Adaptive chi-square tests , 2000 .

[13]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[14]  B. Laurent,et al.  ADAPTIVE TESTS OF LINEAR HYPOTHESES BY MODEL SELECTION , 2003 .

[15]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[16]  G. Székely,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[17]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[18]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[19]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, IFIP Working Conference on Database Semantics.

[20]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[21]  Jerome H. Friedman,et al.  A NONPARAMETRIC PROCEDURE FOR COMPARING MULTIVARIATE POINT SETS , 2007 .

[22]  Goodness of fit and homogeneity tests on the basis of N-distances , 2009 .

[23]  Ofer Levi,et al.  Networks of polynomial pieces with application to the analysis of point clouds and images , 2007, J. Approx. Theory.

[24]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[25]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[26]  Larry A. Wasserman,et al.  Manifold Estimation and Singular Deconvolution Under Hausdorff Loss , 2011, ArXiv.

[27]  Guangliang Chen,et al.  Spectral clustering based on local linear approximations , 2010, 1001.1323.

[28]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[29]  Larry A. Wasserman,et al.  Minimax Manifold Estimation , 2010, J. Mach. Learn. Res..

[30]  B. Bhattacharya Power of Graph-Based Two-Sample Tests , 2015 .

[31]  Arlene K. H. Kim,et al.  Tight minimax rates for manifold estimation under Hausdorff loss , 2015 .

[32]  Sashank J. Reddi,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[33]  Barnabás Póczos,et al.  Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing , 2015, ArXiv.

[34]  Barnabás Póczos,et al.  On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives , 2015, AISTATS.

[35]  B. Bhattacharya A general asymptotic framework for distribution‐free graph‐based two‐sample tests , 2015, Journal of the Royal Statistical Society: Series B (Statistical Methodology).