Efficient Statistical Tests: A Neural Tangent Kernel Approach

For machine learning models to make reliable predictions in deployment, one needs to ensure the previously unknown test samples need to be sufficiently similar to the training data. The commonly used shift-invariant kernels do not have the compositionality and fail to capture invariances in high-dimensional data in computer vision. We propose a shift-invariant convolutional neural tangent kernel (SCNTK) based outlier detector and two-sample tests with maximum mean discrepancy (MMD) that isO(n) in the number of samples due to using the random feature approximation. On MNIST and CIFAR10 with various types of dataset shifts, we empirically show that statistical tests with such compositional kernels, inherited from infinitely wide neural networks, achieve higher detection accuracy than existing non-parametric methods. Our method also provides a competitive alternative to adaptive kernel methods that require a training phase.1

[1]  Ruosong Wang,et al.  Enhanced Convolutional Neural Tangent Kernels , 2019, ArXiv.

[2]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[3]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[4]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[5]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[8]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[9]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[10]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[12]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[13]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[14]  Jaehoon Lee,et al.  Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.

[15]  Joaquín Muñoz-García,et al.  A test for the two-sample problem based on empirical characteristic functions , 2008, Comput. Stat. Data Anal..

[16]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[17]  Alexander A. Alemi,et al.  WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[18]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[19]  Feng Liu,et al.  Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.

[20]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[21]  M. Dwass Modified Randomization Tests for Nonparametric Hypotheses , 1957 .

[22]  Bernhard Schölkopf,et al.  A Kernel Approach to Comparing Distributions , 2007, AAAI.

[23]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  Stephan Günnemann,et al.  Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift , 2018, NeurIPS.

[26]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[27]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[28]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[29]  Julien Mairal,et al.  On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.

[30]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[31]  Tongliang Liu,et al.  Maximum Mean Discrepancy is Aware of Adversarial Attacks , 2020, ArXiv.

[32]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[33]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[34]  R. Basri,et al.  On the Similarity between the Laplace and Neural Tangent Kernels , 2020, NeurIPS.

[35]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[36]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[37]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[38]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[39]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[40]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[41]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.