A Kernel Two-Sample Test

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distribution free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

[1]  J. Wilkins A Note on Skewness and Kurtosis , 1944 .

[2]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[3]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[4]  R. Fortet,et al.  Convergence de la répartition empirique vers la répartition théorique , 1953 .

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[7]  P. Bickel A Distribution Free Version of the Smirnov Two Sample Test in the $p$-Variate Case , 1969 .

[8]  Mark S. C. Reed,et al.  Method of Modern Mathematical Physics , 1972 .

[9]  M. Reed Methods of Modern Mathematical Physics. I: Functional Analysis , 1972 .

[10]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[11]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[12]  G. Grimmett,et al.  Probability and random processes , 2002 .

[13]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[14]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[15]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[16]  A. Feuerverger,et al.  A Consistent Test for Bivariate Dependence , 1993 .

[17]  N. H. Anderson,et al.  Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates , 1994 .

[18]  F. Famoye Continuous Univariate Distributions, Volume 1 , 1994 .

[19]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[20]  W. Press,et al.  Numerical Recipes in Fortran: The Art of Scientific Computing.@@@Numerical Recipes in C: The Art of Scientific Computing. , 1994 .

[21]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[22]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[23]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[24]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[25]  E. Giné,et al.  Decoupling: From Dependence to Independence , 1998 .

[26]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[27]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[28]  N. Henze,et al.  On the multivariate runs test , 1999 .

[29]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[30]  Bernard Chazelle,et al.  A minimum spanning tree algorithm with inverse-Ackermann type complexity , 2000, JACM.

[31]  B. Schölkopf,et al.  Sparse Greedy Matrix Approximation for Machine Learning , 2000, ICML.

[32]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[33]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[34]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[35]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[36]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[37]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[38]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[39]  P.J.W. Rayner,et al.  Optimized support vector machines for nonstationary signal classification , 2002, IEEE Signal Processing Letters.

[40]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[41]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[42]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[45]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[46]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[47]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[48]  Matthias Hein,et al.  Hilbertian Metrics on Probability Measures and Their Application in SVM?s , 2004, DAGM-Symposium.

[49]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[50]  J. Friedman On Multivariate Goodness-of-Fit and Two-Sample Testing , 2004 .

[51]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[52]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[53]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[54]  O. Bousquet THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .

[55]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[56]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[57]  Baver Okutmustur Reproducing kernel Hilbert spaces , 2005 .

[58]  László Györfi,et al.  On the asymptotic properties of a nonparametric L/sub 1/-test statistic of homogeneity , 2005, IEEE Transactions on Information Theory.

[59]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[60]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[61]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[62]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[63]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[64]  Alexander J. Smola,et al.  Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[65]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[66]  Larry Wasserman,et al.  All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .

[67]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[68]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[69]  Miroslav Dudík,et al.  Maximum Entropy Distribution Estimation with Generalized Regularization , 2006, COLT.

[70]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[71]  Alexander J. Smola,et al.  Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.

[72]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[73]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[74]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[75]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[76]  John Shawe-Taylor,et al.  A Framework for Probability Density Estimation , 2007, AISTATS.

[77]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[78]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[79]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[80]  Bernhard Schölkopf,et al.  A Kernel Approach to Comparing Distributions , 2007, AAAI.

[81]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[82]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[83]  Bernhard Schölkopf,et al.  Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.

[84]  Le Song,et al.  Tailoring density estimation via reproducing kernel moment matching , 2008, ICML '08.

[85]  Arthur Gretton,et al.  Inferring spike trains from local field potentials. , 2008, Journal of neurophysiology.

[86]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[87]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[88]  Hao Shen,et al.  Fast Kernel-Based Independent Component Analysis , 2009, IEEE Transactions on Signal Processing.

[89]  Bernhard Schölkopf,et al.  Non-parametric estimation of integral probability metrics , 2010, 2010 IEEE International Symposium on Information Theory.

[90]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[91]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[92]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[93]  Kenji Fukumizu,et al.  Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint , 2011, NIPS.

[94]  Mark D. Reid,et al.  Information, Divergence and Risk for Binary Experiments , 2009, J. Mach. Learn. Res..

[95]  Robert L. Wolpert,et al.  Statistical Inference , 2019, Encyclopedia of Social Network Analysis and Mining.