Class-Balancing Diffusion Models

Diffusion-based models have shown the merits of generating high-quality visual data while preserving better diversity in recent studies. However, such observation is only justified with curated data distribution, where the data samples are nicely pre-processed to be uniformly distributed in terms of their labels. In practice, a long-tailed data distribution appears more common and how diffusion models perform on such class-imbalanced data remains unknown. In this work, we first investigate this problem and observe significant degradation in both diversity and fidelity when the diffusion model is trained on datasets with class-imbalanced distributions. Especially in tail classes, the generations largely lose diversity and we observe severe mode-collapse issues. To tackle this problem, we set from the hypothesis that the data distribution is not class-balanced, and propose Class-Balancing Diffusion Models (CBDM) that are trained with a distribution adjustment regularizer as a solution. Experiments show that images generated by CBDM exhibit higher diversity and quality in both quantitative and qualitative ways. Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.

[1]  Thomas C. Shen,et al.  Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study , 2022, DALI@MICCAI.

[2]  R. Venkatesh Babu,et al.  Improving GANs for Long-Tailed Data through Group Spectral Regularization , 2022, ECCV.

[3]  Shihao Ji,et al.  Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model , 2022, ArXiv.

[4]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[5]  Quanlin Wu,et al.  Guided Diffusion Model for Adversarial Purification from Random Noise , 2022, ArXiv.

[6]  Tero Karras,et al.  Elucidating the Design Space of Diffusion-Based Generative Models , 2022, NeurIPS.

[7]  Jinwoo Shin,et al.  Self-Supervised Dense Consistency Regularization for Image-to-Image Translation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  O. Winther,et al.  Few-Shot Diffusion Models , 2022, ArXiv.

[9]  David J. Fleet,et al.  Video Diffusion Models , 2022, NeurIPS.

[10]  Cristian Canton Ferrer,et al.  Generating High Fidelity Data from Low-density Regions using Diffusion Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[13]  S. Ermon,et al.  Solving Inverse Problems in Medical Imaging with Score-Based Generative Models , 2021, ICLR.

[14]  David J. Fleet,et al.  Palette: Image-to-Image Diffusion Models , 2021, SIGGRAPH.

[15]  C. Yoo,et al.  Learning Imbalanced Datasets With Maximum Margin Loss , 2021, 2021 IEEE International Conference on Image Processing (ICIP).

[16]  R. Venkatesh Babu,et al.  Class Balancing GAN with a Classifier in the Loop , 2021, UAI.

[17]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[18]  Jianfei Cai,et al.  RSG: A Simple but Effective Module for Learning Imbalanced Datasets , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[20]  Hung-Yu Tseng,et al.  Regularizing Generative Adversarial Networks under Limited Data , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nam Soo Kim,et al.  Diff-TTS: A Denoising Diffusion Model for Text-to-Speech , 2021, Interspeech.

[22]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[24]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[25]  Ioannis Mitliagkas,et al.  Adversarial score matching and improved sampling for image generation , 2020, ICLR.

[26]  Haibin Ling,et al.  Feature Space Augmentation for Long-Tailed Data , 2020, ECCV.

[27]  Ankit Singh Rawat,et al.  Long-tail learning via logit adjustment , 2020, ICLR.

[28]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[29]  Song Han,et al.  Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[30]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[31]  Jinwoo Shin,et al.  M2m: Imbalanced Classification via Major-to-Minor Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yifan Sun,et al.  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[34]  Xiang Yu,et al.  Feature Transfer Learning for Face Recognition With Under-Represented Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jaakko Lehtinen,et al.  Improved Precision and Recall Metric for Assessing Generative Models , 2019, NeurIPS.

[36]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[38]  Olivier Bachem,et al.  Assessing Generative Models via Precision and Recall , 2018, NeurIPS.

[39]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[40]  Alexander G. Schwing,et al.  Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[42]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[43]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[44]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[46]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[47]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[49]  Xiaogang Wang,et al.  Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[51]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[52]  M. Uecker,et al.  MRI Reconstruction via Data Driven Markov Chain with Joint Uncertainty Estimation , 2022, ArXiv.

[53]  Stefano Ermon,et al.  SDEdit: Image Synthesis and Editing with Stochastic Differential Equations , 2021, ArXiv.