A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes can be reverted via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing the ELBO of the plugin reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.

[1]  G. N. Mil’shtejn Approximate Integration of Stochastic Differential Equations , 1975 .

[2]  David Duvenaud,et al.  Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations , 2021, ArXiv.

[3]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[4]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[5]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[6]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[7]  Surya Ganguli,et al.  Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Jonathan Ho,et al.  Variational Diffusion Models , 2021, ArXiv.

[10]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[12]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[13]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[14]  Xiongzhi Chen Brownian Motion and Stochastic Calculus , 2008 .

[15]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[16]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[17]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[18]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[19]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[20]  Ole Winther,et al.  SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows , 2020, NeurIPS.

[21]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[22]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[23]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[24]  Yang Song,et al.  On Maximum Likelihood Training of Score-Based Generative Models , 2021, ArXiv.

[25]  B. Anderson Reverse-time diffusion equation models , 1982 .

[26]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[27]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Brian D. O. Anderson,et al.  Reverse time diffusions , 1985 .

[30]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[31]  Didrik Nielsen,et al.  Argmax Flows and Multinomial Diffusion: Towards Non-Autoregressive Language Models , 2021, ArXiv.

[32]  Linqi Zhou,et al.  3D Shape Generation and Completion through Point-Voxel Diffusion , 2021, ArXiv.

[33]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Varun Jog,et al.  Information and estimation in Fokker-Planck channels , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[36]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[37]  Diederik P. Kingma,et al.  How to Train Your Energy-Based Models , 2021, ArXiv.

[38]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[39]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[40]  Razvan Pascanu,et al.  A RAD approach to deep mixture models , 2019, DGS@ICLR.

[41]  U. Haussmann,et al.  TIME REVERSAL OF DIFFUSIONS , 1986 .

[42]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[43]  P. Dupuis,et al.  A variational representation for certain functionals of Brownian motion , 1998 .

[44]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[45]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[46]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[47]  Hans Föllmer,et al.  An entropy approach to the time reversal of diffusion processes , 1985 .

[48]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[49]  Marina Velikova,et al.  Monotone and Partially Monotone Neural Networks , 2010, IEEE Transactions on Neural Networks.

[50]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[51]  C. Jarzynski Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach , 1997, cond-mat/9707325.

[52]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[53]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[54]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[55]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[56]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[57]  L. Ungar,et al.  Estimating Monotonic Functions and Their Bounds , 1999 .