KD-DLGAN: Data Limited Image Generation via Knowledge Distillation

Generative Adversarial Networks (GANs) rely heavily on large-scale training data for training high-quality image generation models. With limited training data, the GAN discriminator often suffers from severe overfitting which directly leads to degraded generation especially in generation diversity. Inspired by the recent advances in knowledge distillation (KD), we propose KD-DLGAN, a knowledge-distillation based generation framework that introduces pre-trained vision-language models for training effective data-limited generation models. KD-DLGAN consists of two innovative designs. The first is aggregated generative KD that mitigates the discriminator overfitting by challenging the discriminator with harder learning tasks and distilling more generalizable knowledge from the pre-trained models. The second is correlated generative KD that improves the generation diversity by distilling and preserving the diverse image-text correlation within the pre-trained models. Extensive experiments over multiple benchmarks show that KD-DLGAN achieves superior image generation with limited training data. In addition, KD-DLGAN complements the state-of-the-art with consistent and substantial performance gains.

[1]  Xiansheng Hua,et al.  Towards Counterfactual Image Manipulation via CLIP , 2022, ACM Multimedia.

[2]  Yap-Peng Tan,et al.  Learning Transferable Human-Object Interaction Detector with Natural Language Supervision , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhulin An,et al.  Cross-Image Relational Knowledge Distillation for Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Raffay Hamid,et al.  Robust Cross-Modal Representation Learning with Progressive Self-Distillation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Trishul M. Chilimbi,et al.  Vision-Language Pre-Training with Triple Contrastive Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  S. Hoi,et al.  BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.

[7]  E. Shechtman,et al.  Ensembling Off-the-shelf Models for GAN Training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bo Dai,et al.  Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data , 2021, NeurIPS.

[9]  Shijian Lu,et al.  GenCo: Generative Co-training for Generative Adversarial Networks with Limited Data , 2021, AAAI.

[10]  Bolei Zhou,et al.  Data-Efficient Instance Generation from Instance Discrimination , 2021, NeurIPS.

[11]  Chunyan Miao,et al.  Unbalanced Feature Transport for Exemplar-based Image Translation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Joseph E. Gonzalez,et al.  Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Hung-Yu Tseng,et al.  Regularizing Generative Adversarial Networks under Limited Data , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[15]  Pavlo Molchanov,et al.  Data-free Knowledge Distillation for Object Detection , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Jose Dolz,et al.  Unsupervised Multi-Target Domain Adaptation Through Knowledge Distillation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Song Han,et al.  Differentiable Augmentation for Data-Efficient GAN Training , 2020, NeurIPS.

[18]  David Bau,et al.  Diverse Image Generation via Self-Conditioned GANs , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Tero Karras,et al.  Training Generative Adversarial Networks with Limited Data , 2020, NeurIPS.

[20]  Jinwoo Shin,et al.  Regularizing Class-Wise Predictions via Self-Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jinwoo Shin,et al.  Freeze the Discriminator: a Simple Baseline for Fine-Tuning GANs , 2020, 2002.10964.

[22]  Colin Raffel,et al.  Towards GAN Benchmarks Which Require Generalization , 2020, ICLR.

[23]  Tero Karras,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Greg Mori,et al.  Similarity-Preserving Knowledge Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Ke Chen,et al.  Structured Knowledge Distillation for Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Julien Rabin,et al.  Detecting Overfitting of Deep Generative Networks via Latent Recovery , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[28]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[29]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[30]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[31]  Anastasios Tefas,et al.  Learning Deep Representations with Probabilistic Knowledge Transfer , 2018, ECCV.

[32]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[33]  Takeru Miyato,et al.  cGANs with Projection Discriminator , 2018, ICLR.

[34]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[36]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[37]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[38]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[40]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[43]  Jitendra Malik,et al.  Cross Modal Distillation for Supervision Transfer , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[45]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[46]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[47]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[48]  Song-Chun Zhu,et al.  Learning Hybrid Image Templates (HIT) by Information Projection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[51]  Shijian Lu,et al.  RF-GAN: A Light and Reconfigurable Network for Unpaired Image-to-Image Translation , 2020, ACCV.

[52]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .