Testing Goodness of Fit of Conditional Density Models with Kernels

We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function $p(y|x)$ and a joint sample, decide whether the sample is drawn from $p(y|x)r_x(x)$ for some density $r_x$. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi drop-off location given a pick-up point. To our knowledge, our work is the first to propose such conditional goodness-of-fit tests that simultaneously have all these desirable properties.

[1]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[2]  N. Chopin,et al.  Control functionals for Monte Carlo integration , 2014, 1410.2392.

[3]  C. Carmeli,et al.  Vector valued reproducing kernel Hilbert spaces and universality , 2008, 0807.1659.

[4]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[5]  G. Varoquaux,et al.  Comparing distributions: 𝓁1 geometry improves kernel two-sample testing , 2019, NeurIPS 2019.

[6]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[7]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[8]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[9]  Marcelo J. Moreira A Conditional Likelihood Ratio Test for Structural Models , 2003 .

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  J. Zheng,et al.  A CONSISTENT TEST OF CONDITIONAL PARAMETRIC DISTRIBUTIONS , 2000, Econometric Theory.

[12]  Zaïd Harchaoui,et al.  A Fast, Consistent Kernel Two-Sample Test , 2009, NIPS.

[13]  Paul Janssen,et al.  Consistency of the Generalized Bootstrap for Degenerate $U$-Statistics , 1993 .

[14]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[15]  Zoltán Szabó,et al.  Characteristic and Universal Tensor Product Kernels , 2017, J. Mach. Learn. Res..

[16]  Arthur Gretton,et al.  Interpretable Distribution Features with Maximum Testing Power , 2016, NIPS.

[17]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[18]  Herman J. Bierens,et al.  Asymptotic Theory of Integrated Conditional Moment Tests , 1997 .

[19]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[20]  H. Bierens Consistent model specification tests , 1982 .

[21]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[22]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[23]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[24]  Lester Mackey,et al.  Random Feature Stein Discrepancies , 2018, NeurIPS.

[25]  B. Mityagin The Zero Set of a Real Analytic Function , 2015, Mathematical Notes.

[26]  C. Carmeli,et al.  VECTOR VALUED REPRODUCING KERNEL HILBERT SPACES OF INTEGRABLE FUNCTIONS AND MERCER THEOREM , 2006 .

[27]  Lixing Zhu,et al.  Model Checks for Generalized Linear Models , 2002 .

[28]  Donald W. K. Andrews,et al.  A Conditional Kolmogorov Test , 1997 .

[29]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[30]  Arthur Gretton,et al.  A Test of Relative Similarity For Model Selection in Generative Models , 2015, ICLR.

[31]  Herman J. Bierens,et al.  A consistent conditional moment test of functional form , 1990 .

[32]  James Hensman,et al.  Gaussian Process Conditional Density Estimation , 2018, NeurIPS.

[33]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[34]  Bernhard Schölkopf,et al.  Informative Features for Model Comparison , 2018, NeurIPS.

[35]  Xu Zheng,et al.  Testing parametric conditional distributions using the nonparametric smoothing method , 2012 .

[36]  Yuichi Kitamura,et al.  Testing conditional moment restrictions , 2003 .

[37]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[38]  Hugo Larochelle,et al.  Neural Autoregressive Distribution Estimation , 2016, J. Mach. Learn. Res..

[39]  Stéphane Canu,et al.  Operator-valued Kernels for Learning from Functional Response Data , 2015, J. Mach. Learn. Res..

[40]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[41]  Charles A. Micchelli,et al.  Universal Multi-Task Kernels , 2008, J. Mach. Learn. Res..

[42]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[43]  Arthur Gretton,et al.  An Adaptive Test of Independence with Analytic Kernel Embeddings , 2016, ICML.