Enhancing Data-Free Adversarial Distillation with Activation Regularization and Virtual Interpolation

Knowledge distillation refers to a technique of transferring the knowledge from a large learned model or an ensemble of learned models to a small model. This method relies on access to the original training set, which might not always be available. A possible solution is a data-free adversarial distillation framework, which deploys a generative network to transfer the teacher model’s knowledge to the student model. However, the data generation efficiency is low in the data-free adversarial distillation. We add an activation regularizer and a virtual interpolation method to improve the data generation efficiency. The activation regularizer enables the students to match the teacher’s predictions close to activation boundaries and decision boundaries. The virtual interpolation method can generate virtual samples and labels in-between decision boundaries. Our experiments show that our approach surpasses state-of-the-art data-free distillation methods. The student model can achieve 95.42% accuracy on CIFAR-10 and 77.05% accuracy on CIFAR-100 without any original training data. Our model’s accuracy is 13.8% higher than the state-of-the-art data-free method on CIFAR-100.

[1]  Atul Prakash,et al.  MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Jihwan P. Choi,et al.  Data-Free Network Quantization With Adversarial Knowledge Distillation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[6]  Vivek Srikumar,et al.  Expressiveness of Rectifier Networks , 2015, ICML.

[7]  Jin Young Choi,et al.  Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons , 2018, AAAI.

[8]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[9]  Hongyu Guo,et al.  MixUp as Locally Linear Out-Of-Manifold Regularization , 2018, AAAI.

[10]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[12]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[14]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[15]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[16]  Jangho Kim,et al.  Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[17]  Xinchao Wang,et al.  Data-Free Adversarial Distillation , 2019, ArXiv.

[18]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[19]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[20]  Yoshua Bengio,et al.  Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[21]  Qi Tian,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[23]  U Kang,et al.  Knowledge Extraction with No Observable Data , 2019, NeurIPS.

[24]  Kartikeya Bhardwaj,et al.  Dream Distillation: A Data-Independent Model Compression Framework , 2019, ArXiv.