Two-sample Testing Using Deep Learning

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are consistent and asymptotically control the type-1 error rate. Their test statistics can be evaluated in linear time (in the sample size). Suitable data representations are obtained in a data-driven way, by solving a supervised or unsupervised transfer-learning task on an auxiliary (potentially distinct) data set. If no auxiliary data is available, we split the data into two chunks: one for learning representations and one for computing the test statistic. In experiments on audio samples, natural images and three-dimensional neuroimaging data our tests yield significant decreases in type-2 error rate (up to 35 percentage points) compared to state-of-the-art two-sample tests such as kernel-methods and classifier two-sample tests.

[1]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[2]  Yongtao Guan,et al.  On the Null Distribution of Bayes Factors in Linear Regression , 2018, Journal of the American Statistical Association.

[3]  Kilian Q. Weinberger,et al.  An empirical study on evaluation metrics of generative adversarial networks , 2018, ArXiv.

[4]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[5]  M. D. Ernst Permutation Methods: A Basis for Exact Inference , 2004 .

[6]  Peng Hao,et al.  Transfer learning using computational intelligence: A survey , 2015, Knowl. Based Syst..

[7]  Bernhard Schölkopf,et al.  Informative Features for Model Comparison , 2018, NeurIPS.

[8]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[9]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[10]  Fei-Fei Li,et al.  Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs , 2012 .

[11]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[12]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[13]  Larry A. Wasserman,et al.  Classification Accuracy as a Proxy for Two Sample Testing , 2016, The Annals of Statistics.

[14]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[15]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[16]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[17]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[18]  Wei Dai,et al.  Very deep convolutional neural networks for raw waveforms , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Johannes Bausch,et al.  On the efficient calculation of a linear combination of chi-square random variables with an application in counting string vacua , 2012, 1208.2691.

[20]  Boris Hanin,et al.  Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.

[21]  J. Haines,et al.  Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. , 1993, Science.

[22]  Rolf Lefering,et al.  Probability of adverse events that have not yet occurred: a statistical reminder , 1995, BMJ.

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[27]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[28]  J. Friedman On Multivariate Goodness-of-Fit and Two-Sample Testing , 2004 .

[29]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[30]  Vikas Singh,et al.  Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease , 2016, NIPS.

[31]  Bruce R. Rosen,et al.  Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures , 2015, Scientific Data.

[32]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[33]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[34]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[35]  R. Durrett Probability: Theory and Examples , 1993 .

[36]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[37]  V. Bentkus A Lyapunov-type Bound in Rd , 2005 .

[38]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[39]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[40]  David Lopez-Paz,et al.  Revisiting Classifier Two-Sample Tests , 2016, ICLR.

[41]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[42]  Gilles Blanchard,et al.  Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies , 2016, Scientific Reports.