论文信息 - A Variational Perspective on Diffusion-Based Generative Models and Score Matching - 字舞流文

A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plugin reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.

Aaron C. Courville | Aaron Courville | Jae Hyun Lim | Chin-Wei Huang | Chin-Wei Huang

[1] G. N. Mil’shtejn. Approximate Integration of Stochastic Differential Equations , 1975 .

[2] David Duvenaud,et al. Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations , 2021, ArXiv.

[3] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[4] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[5] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..

[6] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[7] Surya Ganguli,et al. Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.

[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[9] Jonathan Ho,et al. Variational Diffusion Models , 2021, ArXiv.

[10] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11] Alexandre Lacoste,et al. Neural Autoregressive Flows , 2018, ICML.

[12] Travis E. Oliphant,et al. Python for Scientific Computing , 2007, Computing in Science & Engineering.

[13] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[14] Xiongzhi Chen. Brownian Motion and Stochastic Calculus , 2008 .

[15] Noah Snavely,et al. Learning Gradient Fields for Shape Generation , 2020, ECCV.

[16] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[17] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.

[18] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[19] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[20] Ole Winther,et al. SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows , 2020, NeurIPS.

[21] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[22] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[23] Maxim Raginsky,et al. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[24] Yang Song,et al. On Maximum Likelihood Training of Score-Based Generative Models , 2021, ArXiv.

[25] B. Anderson. Reverse-time diffusion equation models , 1982 .

[26] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[27] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[28] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[29] Brian D. O. Anderson,et al. Reverse time diffusions , 1985 .

[30] Travis E. Oliphant,et al. Guide to NumPy , 2015 .

[31] Didrik Nielsen,et al. Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models , 2021, ArXiv.

[32] Linqi Zhou,et al. 3D Shape Generation and Completion through Point-Voxel Diffusion , 2021, ArXiv.

[33] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Varun Jog,et al. Information and estimation in Fokker-Planck channels , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[35] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .

[37] Diederik P. Kingma,et al. How to Train Your Energy-Based Models , 2021, ArXiv.

[38] Joseph Sill,et al. Monotonic Networks , 1997, NIPS.

[39] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[40] Razvan Pascanu,et al. A RAD approach to deep mixture models , 2019, DGS@ICLR.

[41] U. Haussmann,et al. TIME REVERSAL OF DIFFUSIONS , 1986 .

[42] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[43] P. Dupuis,et al. A variational representation for certain functionals of Brownian motion , 1998 .

[44] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[45] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[46] David Duvenaud,et al. Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[47] Hans Föllmer,et al. An entropy approach to the time reversal of diffusion processes , 1985 .

[48] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[49] Marina Velikova,et al. Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[50] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[51] C. Jarzynski. Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach , 1997, cond-mat/9707325.

[52] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[53] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[54] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[55] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[56] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[57] L. Ungar,et al. Estimating Monotonic Functions and Their Bounds , 1999 .