Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks

High-performance Deep Neural Networks (DNNs) are increasingly deployed in many real-world applications e.g., cloud prediction APIs. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such applications, which require a lot of time, money, and effort to develop. Existing defenses take a passive role against stealing attacks, such as by truncating predicted information. We find such passive defenses ineffective against DNN stealing attacks. In this paper, we propose the first defense which actively perturbs predictions targeted at poisoning the training objective of the attacker. We find our defense effective across a wide range of challenging datasets and DNN model stealing attacks, and additionally outperforms existing defenses. Our defense is the first that can withstand highly accurate model stealing attacks for tens of thousands of queries, amplifying the attacker's error rate up to a factor of 85$\times$ with minimal impact on the utility for benign users.

[1]  Dan Boneh,et al.  Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[2]  Ling Huang,et al.  Near-Optimal Evasion of Convex-Inducing Classifiers , 2010, AISTATS.

[3]  Alberto Ferreira de Souza,et al.  Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[6]  Blaine Nelson,et al.  Misleading Learners: Co-opting Your Spam Filter , 2009 .

[7]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Vinod Ganapathy,et al.  A framework for the extraction of Deep Neural Networks by leveraging public data , 2019, ArXiv.

[10]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[11]  Xiangliang Zhang,et al.  Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.

[12]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[13]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[15]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[16]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[17]  Anca D. Dragan,et al.  Model Reconstruction from Model Explanations , 2018, FAT.

[18]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[19]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[20]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  David Berthelot,et al.  High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.

[22]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[23]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[24]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[25]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[26]  Karla L. Hoffman,et al.  A method for globally minimizing concave functions over convex sets , 1981, Math. Program..

[27]  Chengfang Fang,et al.  BDPL: A Boundary Differentially Private Layer Against Machine Learning Model Extraction Attacks , 2019, ESORICS.

[28]  Somesh Jha,et al.  Exploring Connections Between Active Learning and Model Extraction , 2018, USENIX Security Symposium.

[29]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.