Random Feature Stein Discrepancies

Computable Stein discrepancies have been deployed for a variety of applications, ranging from sampler selection in posterior inference to approximate Bayesian inference to goodness-of-fit testing. Existing convergence-determining Stein discrepancies admit strong theoretical guarantees but suffer from a computational cost that grows quadratically in the sample size. While linear-time Stein discrepancies have been proposed for goodness-of-fit testing, they exhibit avoidable degradations in testing power—even when power is explicitly optimized. To address these shortcomings, we introduce feature Stein discrepancies (ΦSDs), a new family of quality measures that can be cheaply approximated using importance sampling. We show how to construct ΦSDs that provably determine the convergence of a sample to its target and develop high-accuracy approximations—random ΦSDs (RΦSDs)—which are computable in near-linear time. In our experiments with sampler selection for approximate posterior inference and goodness-of-fit testing, RΦSDs perform as well or better than quadratic-time KSDs while being orders of magnitude faster to compute.

[1]  T. Hirai The Plancherel formula for SU(p, q) , 1970 .

[2]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[3]  Editors , 1986, Brain Research Bulletin.

[4]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[5]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[6]  Holger Wendland,et al.  Scattered Data Approximation: Conditionally positive definite functions , 2004 .

[7]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[8]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[9]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[10]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[11]  Jr.,et al.  The Plancherel Formula, the Plancherel Theorem, and the Fourier transform of orbital integrals , 2011, 1101.3753.

[12]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[13]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[14]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[15]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[16]  Deyu Meng,et al.  FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[17]  Jeff G. Schneider,et al.  On the Error of Random Fourier Features , 2015, UAI.

[18]  Zoltán Szabó,et al.  Optimal Rates for Random Fourier Features , 2015, NIPS.

[19]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[20]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[21]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[22]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[23]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[24]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[25]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[26]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[27]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[28]  Qiang Liu,et al.  Black-box Importance Sampling , 2016, AISTATS.

[29]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[30]  A. Appendix On the Sampling Problem for Kernel Quadrature , 2017 .

[31]  Jean Honorio,et al.  The Error Probability of Random Fourier Features is Dimensionality Independent , 2017, ArXiv.