DisGUIDE: Disagreement-Guided Data-Free Model Extraction

Recent model-extraction attacks on Machine Learning as a Service (MLaaS) systems have moved towards data-free approaches, showing the feasibility of stealing models trained with difficult-to-access data. However, these attacks are ineffective or limited due to the low accuracy of extracted models and the high number of queries to the models under attack. The high query cost makes such techniques infeasible for online MLaaS systems that charge per query. We create a novel approach to get higher accuracy and query efficiency than prior data-free model extraction techniques. Specifically, we introduce a novel generator training scheme that maximizes the disagreement loss between two clone models that attempt to copy the model under attack. This loss, combined with diversity loss and experience replay, enables the generator to produce better instances to train the clone models. Our evaluation on popular datasets CIFAR-10 and CIFAR-100 shows that our approach improves the final model accuracy by up to 3.42% and 18.48% respectively. The average number of queries required to achieve the accuracy of the prior state of the art is reduced by up to 64.95%. We hope this will promote future work on feasible data-free model extraction and defenses against such attacks.

[1]  Rudolf Mayer,et al.  I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences , 2022, ACM Comput. Surv..

[2]  R. Venkatesh Babu,et al.  Towards Data-Free Model Stealing in a Hard Label Setting , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Naoto Yanai,et al.  First to Possess His Statistics: Data-Free Model Extraction Attack on Tabular Data , 2021, ArXiv.

[4]  Toshiki Shibahara,et al.  MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI , 2021, ArXiv.

[5]  Nicolas Papernot,et al.  Data-Free Model Extraction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Marios Savvides,et al.  Online Ensemble Model Compression Using Knowledge Distillation , 2020, ECCV.

[7]  Yaoliang Yu,et al.  Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Yoshua Bengio,et al.  Revisiting Fundamentals of Experience Replay , 2020, ICML.

[9]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[10]  Robert Mullins,et al.  Sponge Examples: Energy-Latency Attacks on Neural Networks , 2020, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).

[11]  Moinuddin K. Qureshi,et al.  MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  R. Venkatesh Babu,et al.  DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier , 2019, AAAI.

[13]  Xinchao Wang,et al.  Data-Free Adversarial Distillation , 2019, ArXiv.

[14]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nicolas Papernot,et al.  High Accuracy and High Fidelity Extraction of Neural Networks , 2019, USENIX Security Symposium.

[16]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[17]  R. Venkatesh Babu,et al.  Zero-Shot Knowledge Distillation in Deep Networks , 2019, ICML.

[18]  Chao Xu,et al.  Data-Free Learning of Student Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  R. Venkatesh Babu,et al.  Ask, Acquire, and Attack: Data-free UAP Generation using Class Impressions , 2018, ECCV.

[21]  Thad Starner,et al.  Data-Free Knowledge Distillation for Deep Neural Networks , 2017, ArXiv.

[22]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[23]  A. Juels,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[24]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[27]  Martin J. Wainwright,et al.  Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods , 2012, NIPS.

[28]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[29]  Shoushan Li,et al.  One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification , 2022, COLING.

[30]  Yaoliang Yu,et al.  Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training , 2021, NeurIPS.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .