Improved Techniques for Training Score-Based Generative Models

Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32x32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64x64 to 256x256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.

[1]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[2]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[3]  E. Grafarend Linear and nonlinear models : fixed effects, random effects, and mixed models , 2006 .

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[6]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[7]  B. Efron Tweedie’s Formula and Selection Bias , 2011, Journal of the American Statistical Association.

[8]  Eero P. Simoncelli,et al.  Least Squares Estimation Without Priors or Supervision , 2011, Neural Computation.

[9]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[10]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[13]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[14]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[15]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[16]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[18]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[19]  Olivier Bachem,et al.  Assessing Generative Models via Precision and Recall , 2018, NeurIPS.

[20]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[21]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[22]  Rishi Sharma,et al.  A Note on the Inception Score , 2018, ArXiv.

[23]  Zengyi Li,et al.  Learning Energy-Based Models in High-Dimensional Spaces with Multi-scale Denoising Score Matching , 2019, 1910.07762.

[24]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[25]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[26]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[27]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[28]  Aapo Hyvärinen,et al.  Neural Empirical Bayes , 2019, J. Mach. Learn. Res..

[29]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael S. Bernstein,et al.  HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models , 2019, NeurIPS.

[31]  Stefano Ermon,et al.  Bridging the Gap Between $f$-GANs and Wasserstein GANs , 2019, ICML.

[32]  Eero P. Simoncelli,et al.  Solving Linear Inverse Problems Using the Prior Implicit in a Denoiser , 2020, ArXiv.

[33]  Ioannis Mitliagkas,et al.  Adversarial score matching and improved sampling for image generation , 2020, ICLR.