Learning Kernel Tests Without Data Splitting

Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.

[1]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[2]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[3]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[4]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[5]  Jean-Philippe Vert,et al.  kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection , 2019, ICML.

[6]  Larry A. Wasserman,et al.  Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[7]  Kenji Fukumizu,et al.  Post Selection Inference with Kernels , 2016, AISTATS.

[8]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[9]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[10]  Qingtang Jiang,et al.  Two‐sample test based on classification probability , 2019, Stat. Anal. Data Min..

[11]  J. I The Design of Experiments , 1936, Nature.

[12]  工藤 昭夫,et al.  A Multivariate Analogue of the One-Sided Testについての一注意 (多次元統計解析の数理的研究) , 1979 .

[13]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[14]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[15]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.

[16]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[17]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[18]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[19]  Feng Liu,et al.  Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.

[20]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Arthur Gretton,et al.  An Adaptive Test of Independence with Analytic Kernel Embeddings , 2016, ICML.

[23]  Marius Kloft,et al.  Two-sample Testing Using Deep Learning , 2020, AISTATS.

[24]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[25]  Alexander Cloninger,et al.  Classification Logit Two-Sample Testing by Neural Networks for Differentiating Near Manifold Densities , 2019, IEEE Transactions on Information Theory.

[26]  Wojciech Zaremba,et al.  B-test: A Non-parametric, Low Variance Kernel Two-sample Test , 2013, NIPS.

[27]  Dennis L. Sun,et al.  Exact post-selection inference, with application to the lasso , 2013, 1311.6238.

[28]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[29]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  Informative Features for Model Comparison , 2018, NeurIPS.

[31]  Dennis L. Sun,et al.  Optimal Inference After Model Selection , 2014, 1410.2597.

[32]  G. Varoquaux,et al.  Comparing distributions: 𝓁1 geometry improves kernel two-sample testing , 2019, NeurIPS 2019.

[33]  A. Shapiro Towards a unified theory of inequality constrained testing in multivariate analysis , 1988 .

[34]  S. Janson The asymptotic distributions of incomplete U-statistics , 1984 .

[35]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[36]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[37]  Bernhard Schölkopf,et al.  Kernel Stein Tests for Multiple Model Comparison , 2019, NeurIPS.

[38]  Kenji Fukumizu,et al.  Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator , 2018, ICLR.

[39]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[40]  J. Friedman On Multivariate Goodness-of-Fit and Two-Sample Testing , 2004 .

[41]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.