A Kernel Stein Test for Comparing Latent Variable Models

We propose a nonparametric, kernel-based test to assess the relative goodness of fit of latent variable models with intractable unnormalized densities. Our test generalises the kernel Stein discrepancy (KSD) tests of (Liu et al., 2016, Chwialkowski et al., 2016, Yang et al., 2018, Jitkrittum et al., 2018) which required exact access to unnormalized densities. Our new test relies on the simple idea of using an approximate observed-variable marginal in place of the exact, intractable one. As our main theoretical contribution, we prove that the new test, with a properly corrected threshold, has a well-controlled type-I error. In the case of models with low-dimensional latent structure and high-dimensional observations, our test significantly outperforms the relative maximum mean discrepancy test (Bounliphone et al., 2015) , which cannot exploit the latent structure.

[1]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[2]  H. Callaert,et al.  The Berry-Esseen Theorem for $U$-Statistics , 1978 .

[3]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[4]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[7]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[8]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[13]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[14]  Nathan Ross Fundamentals of Stein's method , 2011, 1109.1880.

[15]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[16]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[17]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[18]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[19]  Arthur Gretton,et al.  A low variance consistent test of relative dependency , 2015, ICML.

[20]  Zoubin Ghahramani,et al.  Statistical Model Criticism using Kernel Two Sample Tests , 2015, NIPS.

[21]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[22]  Arthur Gretton,et al.  A Test of Relative Similarity For Model Selection in Generative Models , 2015, ICLR.

[23]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[24]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[25]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[26]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[27]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[28]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[29]  Bernhard Schölkopf,et al.  Informative Features for Model Comparison , 2018, NeurIPS.

[30]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[31]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[32]  G. Reinert,et al.  Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs , 2017, The Annals of Applied Probability.

[33]  Neeraj Pradhan,et al.  Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro , 2019, ArXiv.

[34]  Guy Bresler,et al.  Stein’s method for stationary distributions of Markov chains and application to Ising models , 2017, The Annals of Applied Probability.