Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?

Diffusion models have recently achieved astonishing performance in generating high-fidelity photo-realistic images. Given their huge success, it is still unclear whether synthetic images are applicable for knowledge distillation when real images are unavailable. In this paper, we extensively study whether and how synthetic images produced from state-of-the-art diffusion models can be used for knowledge distillation without access to real images, and obtain three key conclusions: (1) synthetic data from diffusion models can easily lead to state-of-the-art performance among existing synthesis-based distillation methods, (2) low-fidelity synthetic images are better teaching materials, and (3) relatively weak classifiers are better teachers. Code is available at https://github.com/zhengli97/DM-KD.

[1]  David J. Fleet,et al.  Synthetic Data from Diffusion Models Improves ImageNet Classification , 2023, ArXiv.

[2]  Yuxiao Dong,et al.  ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation , 2023, ArXiv.

[3]  R. Salakhutdinov,et al.  Effective Data Augmentation With Diffusion Models , 2023, ArXiv.

[4]  R. Legenstein,et al.  Restoring Vision in Adverse Weather Conditions With Patch-Based Denoising Diffusion Models , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  David J. Fleet,et al.  Image Super-Resolution via Iterative Refinement , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  William S. Peebles,et al.  Scalable Diffusion Models with Transformers , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Jun Yu Li,et al.  Curriculum Temperature for Knowledge Distillation , 2022, AAAI.

[8]  Cheng Lu,et al.  DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models , 2022, ArXiv.

[9]  Yudong Chen,et al.  Improved Feature Distillation via Projector Ensemble , 2022, NeurIPS.

[10]  Philip H. S. Torr,et al.  Is synthetic data from generative models ready for image recognition? , 2022, ICLR.

[11]  Richang Hong,et al.  Switchable Online Knowledge Distillation , 2022, ECCV.

[12]  Chun Yuan,et al.  ViTKD: Practical Guidelines for ViT feature knowledge distillation , 2022, ArXiv.

[13]  Ngai-Man Cheung,et al.  Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing? , 2022, ICML.

[14]  Jie Song,et al.  Up to 100x Faster Data-free Knowledge Distillation , 2022, AAAI.

[15]  Cheng Lu,et al.  DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps , 2022, NeurIPS.

[16]  David J. Fleet,et al.  Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.

[17]  Prafulla Dhariwal,et al.  Hierarchical Text-Conditional Image Generation with CLIP Latents , 2022, ArXiv.

[18]  Jiajun Liang,et al.  Decoupled Knowledge Distillation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  L. Gool,et al.  RePaint: Inpainting using Denoising Diffusion Probabilistic Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  T. Mitra,et al.  Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay , 2022, AAAI.

[21]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Prafulla Dhariwal,et al.  GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.

[23]  J. Weickert,et al.  Learning Sparse Masks for Diffusion-based Image Inpainting , 2021, IbPRIA.

[24]  David J. Fleet,et al.  Cascaded Diffusion Models for High Fidelity Image Generation , 2021, J. Mach. Learn. Res..

[25]  Qi Li,et al.  SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models , 2021, Neurocomputing.

[26]  Yuki M. Asano,et al.  The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image , 2021, ICLR.

[27]  Mingli Song,et al.  Online Knowledge Distillation for Efficient Pose Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Mingli Song,et al.  Contrastive Model Inversion for Data-Free Knowledge Distillation , 2021, ArXiv.

[29]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[30]  Jiaya Jia,et al.  Distilling Knowledge via Knowledge Review , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[32]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[33]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[34]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[35]  Sangheum Hwang,et al.  Self-Knowledge Distillation with Progressive Refinement of Targets , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Georgios Tzimiropoulos,et al.  Knowledge distillation via softmax regression representation learning , 2021, ICLR.

[37]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Andrew Howard,et al.  Large-Scale Generative Data-Free Distillation , 2020, ArXiv.

[39]  Zhigeng Pan,et al.  Online Knowledge Distillation via Multi-branch Diversity Enhancement , 2020, ACCV.

[40]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[41]  Jihwan P. Choi,et al.  Data-Free Network Quantization With Adversarial Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Daniel Soudry,et al.  The Knowledge Within: Methods for Data-Free Model Compression , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Chun Chen,et al.  Online Knowledge Distillation with Diverse Peers , 2019, AAAI.

[45]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[46]  Xiang Bai,et al.  Intra-class Feature Variation Distillation for Semantic Segmentation , 2020, ECCV.

[47]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[48]  Kaisheng Ma,et al.  Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Yan Lu,et al.  Relational Knowledge Distillation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Mao Ye,et al.  Fast Human Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[55]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[56]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[57]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[59]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[62]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[63]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[64]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[65]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[66]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[67]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).