Learning Implicit Generative Models Using Differentiable Graph Tests

Recently, there has been a growing interest in the problem of learning rich implicit models - those from which we can sample, but can not evaluate their density. These models apply some parametric function, such as a deep network, to a base measure, and are learned end-to-end using stochastic optimization. One strategy of devising a loss function is through the statistics of two sample tests - if we can fool a statistical test, the learned distribution should be a good model of the true data. However, not all tests can easily fit into this framework, as they might not be differentiable with respect to the data points, and hence with respect to the parameters of the implicit model. Motivated by this problem, in this paper we show how two such classical tests, the Friedman-Rafsky and k-nearest neighbour tests, can be effectively smoothed using ideas from undirected graphical models - the matrix tree theorem and cardinality potentials. Moreover, as we show experimentally, smoothing can significantly increase the power of the test, which might of of independent interest. Finally, we apply our method to learn implicit models.

[1]  Alfred O. Hero,et al.  Empirical Non-Parametric Estimation of the Fisher Information , 2014, IEEE Signal Processing Letters.

[2]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[3]  Prabhakar Raghavan,et al.  The electrical resistance of a graph captures its commute and cover times , 2005, computational complexity.

[4]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[5]  J. Yukich Probability theory of classical Euclidean optimization problems , 1998 .

[6]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[7]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[8]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[9]  A. Barbour,et al.  Random association of symmetric arrays , 1986 .

[10]  Ryan P. Adams,et al.  Cardinality Restricted Boltzmann Machines , 2012, NIPS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[15]  H. Wool THE RELATION BETWEEN MEASURES OF CORRELATION IN THE UNIVERSE OF SAMPLE PERMUTATIONS , 1944 .

[16]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[17]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[18]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[19]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[20]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[21]  R. Prim Shortest connection networks and some generalizations , 1957 .

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[26]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[27]  R. Lyons Determinantal probability measures , 2002, math/0204325.

[28]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[29]  Richard S. Zemel,et al.  Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning , 2013, ICML.

[30]  B. Bhattacharya Power of Graph-Based Two-Sample Tests , 2015 .

[31]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[32]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[33]  Barnabás Póczos,et al.  On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions , 2014, AAAI.

[34]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[35]  N. Henze,et al.  On the multivariate runs test , 1999 .

[36]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[37]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..

[38]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[39]  Nancy R. Zhang,et al.  Graph-Based Tests for Two-Sample Comparisons of Categorical Data , 2012, 1208.5755.

[40]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[41]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.