A Kernelized Stein Discrepancy for Goodness-of-fit Tests

We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly.

[1]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[2]  Venkat Chandrasekaran,et al.  Complexity of Inference in Graphical Models , 2008, UAI.

[3]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[4]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[5]  Anima Anandkumar,et al.  Score Function Features for Discriminative Learning: Matrix and Tensor Framework , 2014, ArXiv.

[6]  Brian Kent Aldershof,et al.  Estimation of integrated squared density derivatives , 1991 .

[7]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[8]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[9]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[10]  M. Girolami,et al.  Convergence rates for a class of estimators based on Stein’s method , 2016, Bernoulli.

[11]  C. Stein A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , 1972 .

[12]  P. Massart,et al.  Estimation of Integral Functionals of a Density , 1995 .

[13]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[14]  Yvik Swan,et al.  Stein’s density approach and information inequalities , 2012, 1210.3921.

[15]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[16]  Ding-Xuan Zhou Derivative reproducing properties for kernel methods in learning theory , 2008 .

[17]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[18]  Murat A. Erdogdu Newton-Stein Method: An Optimization Method for GLMs via Stein's Lemma , 2015, J. Mach. Learn. Res..

[19]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[20]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[21]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[22]  Steve P. Brooks,et al.  Output Assessment for Monte Carlo Simulations via the Score Statistic , 2006 .

[23]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.

[24]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[25]  Kirthevasan Kandasamy,et al.  Nonparametric Estimation of Renyi Divergence and Friends , 2014, ICML.

[26]  Arthur Gretton,et al.  A Wild Bootstrap for Degenerate Kernel Tests , 2014, NIPS.

[27]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[28]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[29]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[30]  Ruslan Salakhutdinov,et al.  Learning Deep Generative Models , 2009 .

[31]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[32]  Grace S. Shieh,et al.  Two‐stage U‐statistics for Hypothesis Testing , 2006 .

[33]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[34]  Wojciech Zaremba,et al.  B-tests: Low Variance Kernel Two-Sample Tests , 2013, NIPS 2013.

[35]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[36]  Paul Janssen,et al.  Consistency of the Generalized Bootstrap for Degenerate $U$-Statistics , 1993 .

[37]  Aapo Hyv Estimation of Non-Normalized Statistical Models by Score Matching , 2005 .

[38]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[39]  M. Malek Vector Calculus , 2014 .

[40]  Anima Anandkumar,et al.  Provable Tensor Methods for Learning Mixtures of Generalized Linear Models , 2014, AISTATS.