Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy

Recent work has combined Stein’s method with reproducing kernel Hilbert space theory to develop nonparametric goodness-of-fit tests for unnormalized probability distributions. However, the currently available tests apply exclusively to distributions with smooth density functions. In this work, we introduce a kernelized Stein discrepancy measure for discrete spaces, and develop a nonparametric goodness-of-fit test for discrete distributions with intractable normalization constants. Furthermore, we propose a general characterization of Stein operators that encompasses both discrete and continuous distributions, providing a recipe for constructing new Stein operators. We apply the proposed goodness-of-fit test to three statistical models involving discrete distributions, and our experiments show that the proposed test typically outperforms a two-sample test based on the maximum mean discrepancy.

[1]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[2]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[3]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[4]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[5]  Martina Morris,et al.  Fit, Simulate and Diagnose Exponential-Family Models forNetworks , 2015 .

[6]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[7]  C. Stein Approximate computation of expectations , 1986 .

[8]  AN Kolmogorov-Smirnov,et al.  Sulla determinazione empírica di uma legge di distribuzione , 1933 .

[9]  Caroline Uhler,et al.  Exact Goodness‐of‐Fit Testing for the Ising Model , 2014, 1410.1242.

[10]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[11]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[12]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[13]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[14]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[15]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[16]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[17]  G. Reinert,et al.  Stein's method for comparison of univariate distributions , 2014, 1408.2998.

[18]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[19]  G. Reinert,et al.  Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs , 2017, The Annals of Applied Probability.

[20]  Shun-ichi Amari,et al.  Information Geometry and Its Applications , 2016 .

[21]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[22]  Gregory Valiant,et al.  Instance optimal learning of discrete distributions , 2016, STOC.

[23]  Guy Bresler,et al.  Stein’s method for stationary distributions of Markov chains and application to Ising models , 2017, The Annals of Applied Probability.

[24]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[25]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[26]  N. Smirnov Table for Estimating the Goodness of Fit of Empirical Distributions , 1948 .

[27]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[28]  Paul Janssen,et al.  Consistency of the Generalized Bootstrap for Degenerate $U$-Statistics , 1993 .

[29]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[30]  Yvik Swan,et al.  Stein’s density approach and information inequalities , 2012, 1210.3921.

[31]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[32]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[33]  D. Brook On the distinction between the conditional probability and the joint probability approaches in the specification of nearest-neighbour systems , 1964 .