论文信息 - On Maximum Likelihood Training of Score-Based Generative Models - 字舞流文

On Maximum Likelihood Training of Score-Based Generative Models

Score-based generative modeling has recently emerged as a promising alternative to traditional likelihood-based or implicit approaches. Learning in score-based models involves first perturbing data with a continuous-time stochastic process, and then matching the timedependent gradient of the logarithm of the noisy data density—or score function—using a continuous mixture of score matching losses. In this note, we show that such an objective is equivalent to maximum likelihood for certain choices of mixture weighting. This connection provides a principled way to weight the objective function, and justifies its use for comparing different score-based generative models. Taken together with previous work, our result reveals that both maximum likelihood training and test-time log-likelihood evaluation can be achieved through parameterization of the score function alone, without the need to explicitly parameterize a density function.

Yang Song | Conor Durkan | Conor Durkan | Yang Song

[1] A. Barbour. Stein's method and poisson process convergence , 1988, Journal of Applied Probability.

[2] A. J. Stam. Some Inequalities Satisfied by the Quantities of Information of Fisher and Shannon , 1959, Inf. Control..

[3] David Duvenaud,et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[4] Maxim Raginsky,et al. Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[5] Aapo Hyvärinen,et al. Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[6] Yang Song,et al. Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[7] T. Faniran. Numerical Solution of Stochastic Differential Equations , 2015 .

[8] Erchin Serpedin,et al. On the Equivalence Between Stein and De Bruijn Identities , 2012, IEEE Transactions on Information Theory.

[9] Noah Snavely,et al. Learning Gradient Fields for Shape Generation , 2020, ECCV.

[10] M. Opper,et al. Interacting Particle Solutions of Fokker–Planck Equations Through Gradient–Log–Density Estimation , 2020, Entropy.

[11] Shakir Mohamed,et al. Distribution Matching in Variational Inference , 2018, ArXiv.

[12] Iain Murray. Advances in Markov chain Monte Carlo methods , 2007 .

[13] Jascha Sohl-Dickstein,et al. Minimum Probability Flow Learning , 2009, ICML.

[14] Lester W. Mackey,et al. Measuring Sample Quality with Stein's Method , 2015, NIPS.

[15] Eero P. Simoncelli,et al. Learning to be Bayesian without Supervision , 2006, NIPS.

[16] Eric Nalisnick,et al. Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[17] B. Anderson. Reverse-time diffusion equation models , 1982 .

[18] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[19] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[20] Siwei Lyu,et al. Unifying Non-Maximum Likelihood Learning Objectives with Minimum KL Contraction , 2011, NIPS.

[21] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[22] A. Barron,et al. Fisher information inequalities and the central limit theorem , 2001, math/0111020.

[23] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[24] David Duvenaud,et al. Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[25] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[27] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[28] Eero P. Simoncelli,et al. Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[29] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[30] A. Barron. ENTROPY AND THE CENTRAL LIMIT THEOREM , 1986 .

[31] Radford M. Neal. Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[32] Pascal Vincent,et al. A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[33] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[34] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[35] Siwei Lyu,et al. Interpretation and Generalization of Score Matching , 2009, UAI.

[36] Dilin Wang,et al. Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.