Extraction of Complex DNN Models: Real Threat or Boogeyman?

Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML models using the information leaked to clients through the results returned via the API. In this work, we question whether model extraction is a serious threat to complex, real-life ML models. We evaluate the current state-of-the-art model extraction attack (Knockoff nets) against complex models. We reproduce and confirm the results in the original paper. But we also show that the performance of this attack can be limited by several factors, including ML model architecture and the granularity of API response. Furthermore, we introduce a defense based on distinguishing queries used for Knockoff nets from benign queries. Despite the limitations of the Knockoff nets, we show that a more realistic adversary can effectively steal complex ML models and evade known defenses.

[1]  Radha Poovendran,et al.  On the Limitation of Convolutional Neural Networks in Recognizing Negative Images , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[2]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[3]  David Berthelot,et al.  High-Fidelity Extraction of Neural Network Models , 2019, ArXiv.

[4]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[5]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[6]  Alberto Ferreira de Souza,et al.  Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[7]  Samuel Marchal,et al.  DAWN: Dynamic Adversarial Watermarking of Neural Networks , 2019, ACM Multimedia.

[8]  Shin'ichi Satoh,et al.  Embedding Watermarks into Deep Neural Networks , 2017, ICMR.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[12]  Patrick P. K. Chan,et al.  One-and-a-Half-Class Multiple Classifier Systems for Secure Learning Against Evasion Attacks at Test Time , 2015, MCS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  R. Srikant,et al.  Principled Detection of Out-of-Distribution Examples in Neural Networks , 2017, ArXiv.

[15]  Ian Molloy,et al.  Defending Against Neural Network Model Stealing Attacks Using Deceptive Perturbations , 2019, 2019 IEEE Security and Privacy Workshops (SPW).

[16]  Tribhuvanesh Orekondy,et al.  Prediction Poisoning: Utility-Constrained Defenses Against Model Stealing Attacks , 2019, ArXiv.

[17]  Benny Pinkas,et al.  Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring , 2018, USENIX Security Symposium.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Florian Kerschbaum,et al.  Deep Neural Network Fingerprinting by Conferrable Adversarial Examples , 2019, ICLR.

[20]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[21]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[22]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[23]  V. Sudha,et al.  Diabetic Retinopathy Detection , 2020, International Journal of Engineering and Advanced Technology.

[24]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[25]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lijun Zhang,et al.  Query-Efficient Black-Box Attack by Active Learning , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[27]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[29]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[30]  Konrad Rieck,et al.  Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Carlos V. Rozas,et al.  Intel® Software Guard Extensions (Intel® SGX) Support for Dynamic Memory Management Inside an Enclave , 2016, HASP 2016.

[32]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[33]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[34]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[35]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[36]  David Berthelot,et al.  High Accuracy and High Fidelity Extraction of Neural Networks , 2020, USENIX Security Symposium.

[37]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[40]  Benjamin Edwards,et al.  Defending Against Machine Learning Model Stealing Attacks Using Deceptive Perturbations , 2018 .

[41]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[42]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[43]  Lorenzo Torresani,et al.  Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach , 2010, NIPS.