Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks

With the advances of ML models in recent years, we are seeing an increasing number of real-world commercial applications and services e.g., autonomous vehicles, medical equipment, web APIs emerge. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such ML applications, which require a lot of time, money, and effort to develop. In this paper, we address the issue by studying defenses for model stealing attacks, largely motivated by a lack of effective defenses in literature. We work towards the first defense which introduces targeted perturbations to the model predictions under a utility constraint. Our approach introduces the perturbations targeted towards manipulating the training procedure of the attacker. We evaluate our approach on multiple datasets and attack scenarios across a range of utility constrains. Our results show that it is indeed possible to trade-off utility (e.g., deviation from original prediction, test accuracy) to significantly reduce effectiveness of model stealing attacks.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[3]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[4]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Xiangliang Zhang,et al.  Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering , 2014, CIKM.

[7]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[8]  Tudor Dumitras,et al.  Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks , 2018, NeurIPS.

[9]  Hui Wu,et al.  Protecting Intellectual Property of Deep Neural Networks with Watermarking , 2018, AsiaCCS.

[10]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[11]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[13]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[14]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[15]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[16]  Vinod Ganapathy,et al.  A framework for the extraction of Deep Neural Networks by leveraging public data , 2019, ArXiv.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Karla L. Hoffman,et al.  A method for globally minimizing concave functions over convex sets , 1981, Math. Program..

[19]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[21]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[22]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[23]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[24]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[25]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[26]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[27]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[28]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[29]  Luigi V. Mancini,et al.  Have You Stolen My Model? Evasion Attacks Against Deep Neural Network Watermarking Techniques , 2018, ArXiv.

[30]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[31]  Anca D. Dragan,et al.  Model Reconstruction from Model Explanations , 2018, FAT.

[32]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[33]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[34]  David Berthelot,et al.  High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.

[35]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[36]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[37]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[38]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.