Testing Hypotheses by Regularized Maximum Mean Discrepancy

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.

[1]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[2]  Lecture Notes,et al.  Multiple Comparisons: Bonferroni Corrections and False Discovery Rates , 2004 .

[3]  Bernhard Schölkopf,et al.  Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.

[4]  Whittingstall Kevin,et al.  Correspondence of Visual Evoked Potentials with FMRI Signals in Human Visual Cortex , 2008, Brain Topography.

[5]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[6]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[7]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[8]  Bernhard Schölkopf,et al.  Characteristic Kernels on Groups and Semigroups , 2008, NIPS.

[9]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[10]  Zaïd Harchaoui,et al.  A regularized kernel-based approach to unsupervised audio segmentation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[12]  Jürgen Schmidhuber,et al.  Characteristic Kernels on Structured Domains Excel in Robotics and Human Action Recognition , 2010, ECML/PKDD.

[13]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[14]  Rabab K. Ward,et al.  14 FROM LINEAR ADAPTIVE FILTERING TO NONLINEAR INFORMATION PROCESSING , 2006 .

[15]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[16]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[17]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[18]  J. Príncipe The design and analysis of information processing systems ] From Linear Adaptive Filtering to Nonlinear Information Processing , 2009 .

[19]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[20]  G. Crooks On Measures of Entropy and Information , 2015 .

[21]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[22]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[23]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[24]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[25]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.