A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models

Score matching provides an effective approach to learning flexible unnormalized models, but its scalability is limited by the need to evaluate a second-order derivative. In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows. We present applications with promise in learning neural density estimators on manifolds, and training implicit variational and Wasserstein auto-encoders with a manifold-valued prior.

[1]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[2]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[3]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[4]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[5]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[6]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[7]  R. Reyment,et al.  Statistics and Data Analysis in Geology. , 1988 .

[8]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[9]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[10]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[11]  T. O’Neil Geometric Measure Theory , 2002 .

[12]  Amirhossein Taghvaei,et al.  Accelerated Flow for Probability Distributions , 2019, ICML.

[13]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[14]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[15]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[16]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[17]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[18]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[19]  市原 完治 Brownian motion on a Riemannian manifold , 1981 .

[20]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jianfeng Lu,et al.  Accelerating Langevin Sampling with Birth-death , 2019, ArXiv.

[22]  M. Girolami,et al.  Langevin diffusions and the Metropolis-adjusted Langevin algorithm , 2013, 1309.2983.

[23]  Jesper Ferkinghoff-Borg,et al.  A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[24]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[25]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[26]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[27]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[28]  Changyou Chen,et al.  Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory , 2018, AISTATS.

[29]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[30]  Yang Song,et al.  Stochastic Gradient Geodesic MCMC Methods , 2016, NIPS.

[31]  I. Gibson Statistics and Data Analysis in Geology , 1976, Mineralogical Magazine.

[32]  Francisco J. R. Ruiz,et al.  A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[33]  Aapo Hyvärinen,et al.  Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[34]  C. Villani Optimal Transport: Old and New , 2008 .

[35]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[36]  K. Mardia,et al.  Score matching estimators for directional distributions , 2016, 1604.08470.

[37]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[38]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39]  M. Girolami,et al.  Geodesic Monte Carlo on Embedded Manifolds , 2013, Scandinavian journal of statistics, theory and applications.

[40]  Y. Gliklikh Stochastic Analysis on Manifolds , 2011 .

[41]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[42]  Javier R. Movellan A Minimun Velocity Approach to Learning MPLab TR , 2013 .

[43]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[44]  Jun Zhu,et al.  A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[45]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[46]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[47]  F. Otto THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[48]  Jun Zhu,et al.  Understanding MCMC Dynamics as Flows on the Wasserstein Space , 2019, ICML.

[49]  Ivan Ovinnikov,et al.  Poincaré Wasserstein Autoencoder , 2019, ArXiv.

[50]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[51]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[52]  Yee Whye Teh,et al.  Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders , 2019, NeurIPS.

[53]  Dilin Wang,et al.  Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE , 2017, ArXiv.

[54]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.