ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data

Machine learning models are increasingly being deployed in practice. Machine Learning as a Service (MLaaS) providers expose such models to queries by third-party developers through application programming interfaces (APIs). Prior work has developed model extraction attacks, in which an attacker extracts an approximation of an MLaaS model by making black-box queries to it. We design ActiveThief – a model extraction framework for deep neural networks that makes use of active learning techniques and unannotated public datasets to perform model extraction. It does not expect strong domain knowledge or access to annotated data on the part of the attacker. We demonstrate that (1) it is possible to use ActiveThief to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.

[1]  Kemal Davaslioglu,et al.  Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data , 2018, 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[2]  Frank Hutter,et al.  A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets , 2017, ArXiv.

[3]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[4]  Somesh Jha,et al.  Exploring Connections Between Active Learning and Model Extraction , 2018, USENIX Security Symposium.

[5]  Benjamin Edwards,et al.  Defending Against Model Stealing Attacks Using Deceptive Perturbations , 2018, ArXiv.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[8]  Vijay Arya,et al.  Model Extraction Warning in MLaaS Paradigm , 2017, ACSAC.

[9]  Alberto Ferreira de Souza,et al.  Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[17]  Yi Shi,et al.  How to steal a machine learning classifier with deep learning , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[18]  Kemal Davaslioglu,et al.  Active Deep Learning Attacks under Strict Rate Limitations for Online API Calls , 2018, 2018 IEEE International Symposium on Technologies for Homeland Security (HST).

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[21]  Valentina Emilia Balas,et al.  Stealing Neural Networks via Timing Side Channels , 2018, ArXiv.

[22]  Mehmed M. Kantardzic,et al.  Data Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains , 2017, Neurocomputing.

[23]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[24]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[26]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[27]  Frédéric Precioso,et al.  Adversarial Active Learning for Deep Networks: a Margin Based Approach , 2018, ArXiv.

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[30]  Konrad Rieck,et al.  Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[32]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.