PRADA: Protecting Against DNN Model Stealing Attacks

Machine learning (ML) applications are increasingly prevalent. Protecting the confidentiality of ML models becomes paramount for two reasons: (a) a model can be a business advantage to its owner, and (b) an adversary may use a stolen model to find transferable adversarial examples that can evade classification by the original model. Access to the model can be restricted to be only via well-defined prediction APIs. Nevertheless, prediction APIs still provide enough information to allow an adversary to mount model extraction attacks by sending repeated queries via the prediction API. In this paper, we describe new model extraction attacks using novel approaches for generating synthetic queries, and optimizing training hyperparameters. Our attacks outperform state-of-the-art model extraction in terms of transferability of both targeted and non-targeted adversarial examples (up to +29-44 percentage points, pp), and prediction accuracy (up to +46 pp) on two datasets. We provide take-aways on how to perform effective model extraction attacks. We then propose PRADA, the first step towards generic and effective detection of DNN model extraction attacks. It analyzes the distribution of consecutive API queries and raises an alarm when this distribution deviates from benign behavior. We show that PRADA can detect all prior model extraction attacks with no false positives.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2017, Pattern Recognit..

[2]  Lijun Zhang,et al.  Query-Efficient Black-Box Attack by Active Learning , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[3]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  N. Asokan,et al.  The Untapped Potential of Trusted Execution Environments on Mobile Devices , 2013, IEEE Security & Privacy.

[5]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[6]  Scott Klein Azure Machine Learning , 2017 .

[7]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Benjamin Edwards,et al.  Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations , 2018 .

[9]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12]  Logan Engstrom,et al.  Query-Efficient Black-box Adversarial Examples , 2017, ArXiv.

[13]  Y. B. Wah,et al.  Power comparisons of Shapiro-Wilk , Kolmogorov-Smirnov , Lilliefors and Anderson-Darling tests , 2011 .

[14]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[15]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[16]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[17]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[18]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[19]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[20]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[21]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[22]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[23]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[26]  Ahmad-Reza Sadeghi,et al.  DÏoT: A Crowdsourced Self-learning Approach for Detecting Compromised IoT Devices , 2018, ArXiv.

[27]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[28]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[29]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[30]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[31]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[32]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[33]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[34]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[35]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[36]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[37]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[38]  Luc Van Gool,et al.  Multi-view traffic sign detection, recognition, and 3D localisation , 2014, Machine Vision and Applications.

[39]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[41]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[42]  David Stevens,et al.  On the hardness of evading combinations of linear classifiers , 2013, AISec.

[43]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[44]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[45]  Sampath Kannan,et al.  Oracles and Queries That Are Sufficient for Exact Learning , 1996, J. Comput. Syst. Sci..

[46]  Ling Huang,et al.  Query Strategies for Evading Convex-Inducing Classifiers , 2010, J. Mach. Learn. Res..

[47]  Zhiru Zhang,et al.  Reverse Engineering Convolutional Neural Networks Through Side-channel Information Leaks , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[48]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[49]  R. D'Agostino An omnibus test of normality for moderate and large size samples , 1971 .

[50]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[51]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[52]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[53]  Konrad Rieck,et al.  Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[54]  Krish Shankar,et al.  Azure Machine Learning , 2019 .

[55]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[56]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[57]  Lujo Bauer,et al.  On the Suitability of Lp-Norms for Creating and Preventing Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[58]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[59]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[60]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[61]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[62]  T. W. Anderson,et al.  Asymptotic Theory of Certain "Goodness of Fit" Criteria Based on Stochastic Processes , 1952 .

[63]  Harini Kannan,et al.  Adversarial Logit Pairing , 2018, NIPS 2018.

[64]  David A. Wagner,et al.  Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).