Learning Deep Kernels for Non-Parametric Two-Sample Tests

We propose a class of kernel-based two-sample tests, which aim to determine whether two sets of samples are drawn from the same distribution. Our tests are constructed from kernels parameterized by deep neural nets, trained to maximize test power. These tests adapt to variations in distribution smoothness and shape over space, and are especially suited to high dimensions and complex data. By contrast, the simpler kernels used in prior kernel testing work are spatially homogeneous, and adaptive only in lengthscale. We explain how this scheme includes popular classifier-based two-sample tests as a special case, but improves on them in general. We provide the first proof of consistency for the proposed adaptation method, which applies both to kernels on deep features and to simpler radial basis kernels or multiple kernel learning. In experiments, we establish the superior performance of our deep kernels in hypothesis testing on benchmark and real-world data. The code of our deep-kernel-based two sample tests is available at this https URL.

[1]  Jaime G. Carbonell,et al.  Data-Driven Approach to Multiple-Source Domain Adaptation , 2019, AISTATS.

[2]  Alexander Cloninger,et al.  Classification Logit Two-Sample Testing by Neural Networks for Differentiating Near Manifold Densities , 2019, IEEE Transactions on Information Theory.

[3]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[4]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[5]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[6]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[7]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[8]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[9]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[12]  Ulrike von Luxburg,et al.  Two-Sample Tests for Large Random Graphs Using Network Statistics , 2017, COLT.

[13]  V. S. Korolyuk,et al.  Asymptotic theory of U-statistics , 1988 .

[14]  H. Callaert,et al.  The Berry-Esseen Theorem for $U$-Statistics , 1978 .

[15]  Jerome H. Friedman,et al.  A New Graph-Based Two-Sample Test for Multivariate and Object Data , 2013, 1307.6294.

[16]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[17]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[18]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[19]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[20]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[21]  Wojciech Zaremba,et al.  B-tests: Low Variance Kernel Two-Sample Tests , 2013, NIPS 2013.

[22]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[23]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[24]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[25]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[26]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[27]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[28]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[29]  Ulrike von Luxburg,et al.  Practical methods for graph two-sample testing , 2018, NeurIPS.

[30]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[31]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[32]  Ruth Heller,et al.  Multivariate tests of association based on univariate tests , 2016, NIPS.

[33]  Bernard Ghanem,et al.  Deep Layers as Stochastic Solvers , 2018, International Conference on Learning Representations.

[34]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[35]  Larry A. Wasserman,et al.  Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[36]  Xiaodong Wang,et al.  Fully Distributed Sequential Hypothesis Testing: Algorithms and Asymptotic Analyses , 2018, IEEE Transactions on Information Theory.

[37]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[38]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[39]  Huan Xu,et al.  Robust Hypothesis Testing Using Wasserstein Uncertainty Sets , 2018, NeurIPS.

[40]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[41]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[42]  Marius Kloft,et al.  Two-sample Testing Using Deep Learning , 2020, AISTATS.

[43]  Philip M. Long,et al.  The Singular Values of Convolutional Layers , 2018, ICLR.

[44]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[45]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[46]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[47]  Dougal J. Sutherland Unbiased estimators for the variance of MMD estimators , 2019, ArXiv.

[48]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[49]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[50]  Joaquín Muñoz-García,et al.  A test for the two-sample problem based on empirical characteristic functions , 2008, Comput. Stat. Data Anal..

[51]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .