A framework for the extraction of Deep Neural Networks by leveraging public data

Machine learning models trained on confidential datasets are increasingly being deployed for profit. Machine Learning as a Service (MLaaS) has made such models easily accessible to end-users. Prior work has developed model extraction attacks, in which an adversary extracts an approximation of MLaaS models by making black-box queries to it. However, none of these works is able to satisfy all the three essential criteria for practical model extraction: (1) the ability to work on deep learning models, (2) the non-requirement of domain knowledge and (3) the ability to work with a limited query budget. We design a model extraction framework that makes use of active learning and large public datasets to satisfy them. We demonstrate that it is possible to use this framework to steal deep classifiers trained on a variety of datasets from image and text domains. By querying a model via black-box access for its top prediction, our framework improves performance on an average over a uniform noise baseline by 4.70x for image tasks and 2.11x for text tasks respectively, while using only 30% (30,000 samples) of the public dataset at its disposal.

[1]  Michael R. Lyu,et al.  DeepObfuscation: Securing the Structure of Convolutional Neural Networks via Knowledge Distillation , 2018, ArXiv.

[2]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[3]  Osbert Bastani,et al.  Interpretability via Model Extraction , 2017, ArXiv.

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[6]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[7]  Frédéric Precioso,et al.  Adversarial Active Learning for Deep Networks: a Margin Based Approach , 2018, ArXiv.

[8]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[10]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[11]  Kemal Davaslioglu,et al.  Active Deep Learning Attacks under Strict Rate Limitations for Online API Calls , 2018, 2018 IEEE International Symposium on Technologies for Homeland Security (HST).

[12]  Josep Torrellas,et al.  Cache Telepathy: Leveraging Shared Resource Attacks to Learn DNN Architectures , 2018, USENIX Security Symposium.

[13]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Tudor Dumitras,et al.  Security Analysis of Deep Neural Networks Operating in the Presence of Cache Side-Channel Attacks , 2018, ArXiv.

[16]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yi Shi,et al.  How to steal a machine learning classifier with deep learning , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[18]  Somesh Jha,et al.  Exploring Connections Between Active Learning and Model Extraction , 2018, USENIX Security Symposium.

[19]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[20]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[24]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[25]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[26]  Yuan Xie,et al.  Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints , 2019, ArXiv.

[27]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[28]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[29]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[30]  Kemal Davaslioglu,et al.  Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data , 2018, 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[33]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[34]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[35]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[36]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[38]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[39]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[40]  Yang Zhang,et al.  MLCapsule: Guarded Offline Deployment of Machine Learning as a Service , 2018, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[41]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[42]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[43]  Valentina Emilia Balas,et al.  Stealing Neural Networks via Timing Side Channels , 2018, ArXiv.

[44]  Alberto Ferreira de Souza,et al.  Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[45]  Rodrigo F. Berriel,et al.  Deep Learning-Based Large-Scale Automatic Satellite Crosswalk Classification , 2017, IEEE Geoscience and Remote Sensing Letters.

[46]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[47]  Konrad Rieck,et al.  Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking , 2017, ArXiv.

[48]  Mehmed M. Kantardzic,et al.  Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains , 2017, Neurocomputing.