论文信息 - A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models - 字舞流文

A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models

Score matching provides an effective approach to learning flexible unnormalized models, but its scalability is limited by the need to evaluate a second-order derivative. In this paper, we present a scalable approximation to a general family of learning objectives including score matching, by observing a new connection between these objectives and Wasserstein gradient flows. We present applications with promise in learning neural density estimators on manifolds, and training implicit variational and Wasserstein auto-encoders with a manifold-valued prior.

Bo Zhang | Ziyu Wang | Jun Zhu | Shuyu Cheng | Yueru Li | Jun Zhu | Bo Zhang | Ziyu Wang | Shuyu Cheng | Yueru Li

[1] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[2] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[3] Qiang Liu,et al. Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[4] Aapo Hyvärinen,et al. Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[5] Chang Liu,et al. Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[6] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[7] R. Reyment,et al. Statistics and Data Analysis in Geology. , 1988 .

[8] Bernhard Schölkopf,et al. Wasserstein Auto-Encoders , 2017, ICLR.

[9] Bai Li,et al. A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[10] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[11] T. O’Neil. Geometric Measure Theory , 2002 .

[12] Amirhossein Taghvaei,et al. Accelerated Flow for Probability Distributions , 2019, ICML.

[13] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[14] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[15] Richard E. Turner,et al. Gradient Estimators for Implicit Models , 2017, ICLR.

[16] Bernhard Schölkopf,et al. Deep Energy Estimator Networks , 2018, ArXiv.

[17] Arthur Gretton,et al. Learning deep kernels for exponential family densities , 2018, ICML.

[18] Lester W. Mackey,et al. Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[19] 市原完治. Brownian motion on a Riemannian manifold , 1981 .

[20] Anuj Srivastava,et al. Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jianfeng Lu,et al. Accelerating Langevin Sampling with Birth-death , 2019, ArXiv.

[22] M. Girolami,et al. Langevin diffusions and the Metropolis-adjusted Langevin algorithm , 2013, 1309.2983.

[23] Jesper Ferkinghoff-Borg,et al. A generative, probabilistic model of local protein structure , 2008, Proceedings of the National Academy of Sciences.

[24] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.

[25] Ruslan Salakhutdinov,et al. On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[26] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[27] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.

[28] Changyou Chen,et al. Stochastic Particle-Optimization Sampling and the Non-Asymptotic Convergence Theory , 2018, AISTATS.

[29] Qiang Liu,et al. A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[30] Yang Song,et al. Stochastic Gradient Geodesic MCMC Methods , 2016, NIPS.

[31] I. Gibson. Statistics and Data Analysis in Geology , 1976, Mineralogical Magazine.

[32] Francisco J. R. Ruiz,et al. A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[33] Aapo Hyvärinen,et al. Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[34] C. Villani. Optimal Transport: Old and New , 2008 .

[35] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[36] K. Mardia,et al. Score matching estimators for directional distributions , 2016, 1604.08470.

[37] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[38] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39] M. Girolami,et al. Geodesic Monte Carlo on Embedded Manifolds , 2013, Scandinavian journal of statistics, theory and applications.

[40] Y. Gliklikh. Stochastic Analysis on Manifolds , 2011 .

[41] Eero P. Simoncelli,et al. Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[42] Javier R. Movellan. A Minimun Velocity Approach to Learning MPLab TR , 2013 .

[43] Jiquan Ngiam,et al. Learning Deep Energy Models , 2011, ICML.

[44] Jun Zhu,et al. A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[45] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[46] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[47] F. Otto. THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[48] Jun Zhu,et al. Understanding MCMC Dynamics as Flows on the Wasserstein Space , 2019, ICML.

[49] Ivan Ovinnikov,et al. Poincaré Wasserstein Autoencoder , 2019, ArXiv.

[50] Nicola De Cao,et al. Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[51] Jascha Sohl-Dickstein,et al. Minimum Probability Flow Learning , 2009, ICML.

[52] Yee Whye Teh,et al. Continuous Hierarchical Representations with Poincaré Variational Auto-Encoders , 2019, NeurIPS.

[53] Dilin Wang,et al. Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE , 2017, ArXiv.

[54] Alessandro Barp,et al. Minimum Stein Discrepancy Estimators , 2019, NeurIPS.