论文信息 - Cascaded Diffusion Models for High Fidelity Image Generation - 字舞流文

Cascaded Diffusion Models for High Fidelity Image Generation

We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64×64, 3.52 at 128×128 and 4.88 at 256×256 resolutions, outperforming BigGAN-deep.

David J. Fleet | Mohammad Norouzi | William Chan | Tim Salimans | Jonathan Ho | Chitwan Saharia | David J. Fleet | Tim Salimans | Mohammad Norouzi | Jonathan Ho | William Chan | Chitwan Saharia

[1] Stefano Ermon,et al. Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[2] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[3] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[4] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[5] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.

[6] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[7] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8] Suman V. Ravuri,et al. Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[9] Prafulla Dhariwal,et al. Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[10] Karen Simonyan,et al. Hierarchical Autoregressive Image Models with Auxiliary Decoders , 2019, ArXiv.

[11] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[12] Yan Wu,et al. LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[13] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[14] Surya Ganguli,et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[15] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[16] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[17] Abhishek Kumar,et al. Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[18] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[19] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[20] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[21] Nal Kalchbrenner,et al. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.

[22] Wei Ping,et al. DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[23] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24] Ying Nian Wu,et al. Learning Energy-Based Models by Diffusion Recovery Likelihood , 2020, ICLR.

[25] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[26] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[27] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[28] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[29] Prafulla Dhariwal,et al. Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[30] Yinda Zhang,et al. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[31] David J. Fleet,et al. Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[33] Heiga Zen,et al. WaveGrad: Estimating Gradients for Waveform Generation , 2021, ICLR.

[34] Ioannis Mitliagkas,et al. Adversarial score matching and improved sampling for image generation , 2020, ICLR.

[35] Jiaming Song,et al. Denoising Diffusion Implicit Models , 2021, ICLR.

[36] Pieter Abbeel,et al. Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.