Metrizing Weak Convergence with Maximum Mean Discrepancies

Theorem 12 of Simon-Gabriel & Scholkopf (JMLR, 2018) seemed to close a 40-year-old quest to characterize maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures. We prove, however, that the theorem is incorrect and provide a correction. We show that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose RKHS-functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (ISPD) over all signed, finite, regular Borel measures. We also show that, contrary to the claim of the aforementioned Theorem 12, there exist both bounded continuous ISPD kernels that do not metrize weak convergence and bounded continuous non-ISPD kernels that do metrize it.

[1]  F. Trèves Topological vector spaces, distributions and kernels , 1967 .

[2]  C. Guilbart Étude des produits scalaires sur l'espace des mesures ; Estimation par projections ; Tests à noyaux , 1978 .

[3]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[4]  Š. Schwabik,et al.  Topics In Banach Space Integration , 2005 .

[5]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[6]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[7]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[8]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[9]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[10]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[11]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[12]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[13]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[14]  Bharath K. Sriperumbudur On the optimal estimation of probability measures in weak and strong topologies , 2013, 1310.8240.

[15]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[16]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[17]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[18]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[19]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[20]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[21]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[22]  Bernhard Schölkopf,et al.  Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions , 2016, J. Mach. Learn. Res..

[23]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[24]  I. Chevyrev,et al.  Signature Moments to Characterize Laws of Stochastic Processes , 2018, J. Mach. Learn. Res..

[25]  Lester W. Mackey,et al.  Stein Points , 2018, ICML.

[26]  Franccois-Xavier Briol,et al.  Stein Point Markov Chain Monte Carlo , 2019, ICML.

[27]  Alessandro Barp,et al.  Statistical Inference for Generative Models with Maximum Mean Discrepancy , 2019, ArXiv.

[28]  Masashi Sugiyama,et al.  Bayesian Posterior Approximation via Greedy Particle Optimization , 2018, AAAI.

[29]  Zhitang Chen,et al.  Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit , 2018, AISTATS.

[30]  Abdul Fatir Ansari,et al.  A Characteristic Function Approach to Deep Implicit Generative Modeling , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Liam Hodgkinson,et al.  The reproducing Stein kernel approach for post-hoc corrected sampling , 2020, 2001.09266.

[32]  Zhitang Chen,et al.  Asymptotically Optimal One- and Two-Sample Testing With Kernels , 2019, IEEE Transactions on Information Theory.

[33]  Jon Cockayne,et al.  Optimal thinning of MCMC output , 2020, Journal of the Royal Statistical Society: Series B (Statistical Methodology).