On Maximum Likelihood Training of Score-Based Generative Models

Score-based generative modeling has recently emerged as a promising alternative to traditional likelihood-based or implicit approaches. Learning in score-based models involves first perturbing data with a continuous-time stochastic process, and then matching the timedependent gradient of the logarithm of the noisy data density—or score function—using a continuous mixture of score matching losses. In this note, we show that such an objective is equivalent to maximum likelihood for certain choices of mixture weighting. This connection provides a principled way to weight the objective function, and justifies its use for comparing different score-based generative models. Taken together with previous work, our result reveals that both maximum likelihood training and test-time log-likelihood evaluation can be achieved through parameterization of the score function alone, without the need to explicitly parameterize a density function.

[1]  A. Barbour Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[2]  A. J. Stam Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon , 1959, Inf. Control..

[3]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[4]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[5]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[6]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[7]  T. Faniran Numerical Solution of Stochastic Differential Equations , 2015 .

[8]  Erchin Serpedin,et al.  On the Equivalence Between Stein and De Bruijn Identities , 2012, IEEE Transactions on Information Theory.

[9]  Noah Snavely,et al.  Learning Gradient Fields for Shape Generation , 2020, ECCV.

[10]  M. Opper,et al.  Interacting Particle Solutions of Fokker–Planck Equations Through Gradient–Log–Density Estimation , 2020, Entropy.

[11]  Shakir Mohamed,et al.  Distribution Matching in Variational Inference , 2018, ArXiv.

[12]  Iain Murray Advances in Markov chain Monte Carlo methods , 2007 .

[13]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[14]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[15]  Eero P. Simoncelli,et al.  Learning to be Bayesian without Supervision , 2006, NIPS.

[16]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[17]  B. Anderson Reverse-time diffusion equation models , 1982 .

[18]  Heiga Zen,et al.  WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[19]  Wei Ping,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[20]  Siwei Lyu,et al.  Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction , 2011, NIPS.

[21]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[22]  A. Barron,et al.  Fisher information inequalities and the central limit theorem , 2001, math/0111020.

[23]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[24]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[27]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[28]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[29]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[30]  A. Barron ENTROPY AND THE CENTRAL LIMIT THEOREM , 1986 .

[31]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[32]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[33]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[34]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[35]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[36]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.