Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

Diffusion models are powerful, but they require a lot of time and data to train. We propose Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training time costs while improving data efficiency, which thus helps democratize diffusion model training to broader users. At the core of our innovations is a new conditional score function at the patch level, where the patch location in the original image is included as additional coordinate channels, while the patch size is randomized and diversified throughout training to encode the cross-region dependency at multiple scales. Sampling with our method is as easy as in the original diffusion model. Through Patch Diffusion, we could achieve $\mathbf{\ge 2\times}$ faster training, while maintaining comparable or better generation quality. Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e.g.$, as few as 5,000 images to train from scratch. We achieve state-of-the-art FID scores 1.77 on CelebA-64$\times$64 and 1.93 on AFHQv2-Wild-64$\times$64. We will share our code and pre-trained models soon.

[1]  A. Dimakis,et al.  Soft Diffusion: Score Matching for General Corruptions , 2022, ArXiv.

[2]  Dimitris N. Metaxas,et al.  SINE: SINgle Image Editing with Text-to-Image Diffusion Models , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  T. Michaeli,et al.  SinDDM: A Single Image Denoising Diffusion Model , 2022, ArXiv.

[4]  Dong Chen,et al.  SinDiffusion: Learning a Diffusion Model from a Single Natural Image , 2022, ArXiv.

[5]  M. Irani,et al.  SinFusion: Training Diffusion Models on a Single Image or Video , 2022, ArXiv.

[6]  Cheng Lu,et al.  DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models , 2022, ArXiv.

[7]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[8]  Yuanzhen Li,et al.  DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jonathan J. Hunt,et al.  Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning , 2022, ICLR.

[10]  Yunjey Choi,et al.  Memory Efficient Patch-based Training for INR-based GANs , 2022, ArXiv.

[11]  Bo Zhang,et al.  Estimating the Optimal Covariance with Imperfect Mean in Diffusion Probabilistic Models , 2022, ICML.

[12]  Mingyuan Zhou,et al.  Diffusion-GAN: Training GANs with Diffusion , 2022, ICLR.

[13]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[14]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[15]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[16]  S. Levine,et al.  Planning with Diffusion for Flexible Behavior Synthesis , 2022, ICML.

[17]  E. Shechtman,et al.  Any-resolution Training for High-resolution Image Synthesis , 2022, ECCV.

[18]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[19]  Yi Ren,et al.  Pseudo Numerical Methods for Diffusion Models on Manifolds , 2022, ICLR.

[20]  Bo Zhang,et al.  Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models , 2022, ICLR.

[21]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[23]  Il-Chul Moon,et al.  Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation , 2021, ICML.

[24]  David J. Fleet,et al.  Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[25]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[26]  Tasnima Sadekova,et al.  Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech , 2021, ICML.

[27]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[28]  Yong Jae Lee,et al.  Few-shot Image Generation via Cross-domain Correspondence , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hung-Yu Tseng,et al.  Regularizing Generative Adversarial Networks under Limited Data , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Song Han,et al.  Anycost GANs for Interactive Image Synthesis and Editing , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Zhangyang Wang,et al.  Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective , 2021, NeurIPS.

[32]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[33]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[34]  Mohamed Elhoseiny,et al.  Adversarial Generation of Continuous Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[36]  Bryan Catanzaro,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[37]  Eli Shechtman,et al.  Few-shot Image Generation with Elastic Weight Consolidation , 2020, NeurIPS.

[38]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[39]  Song Han,et al.  Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[40]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[41]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[42]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[43]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[45]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[46]  Wei Wei,et al.  COCO-GAN: Generation by Parts via Conditional Coordinating , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[49]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[50]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[52]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[53]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[54]  Stan Z. Li,et al.  Markov Random Field Models in Computer Vision , 1994, ECCV.