论文信息 - Knockoff Nets: Stealing Functionality of Black-Box Models

Knockoff Nets: Stealing Functionality of Black-Box Models

Machine Learning (ML) models are increasingly deployed in the wild to perform a wide range of tasks. In this work, we ask to what extent can an adversary steal functionality of such ``victim'' models based solely on blackbox interactions: image in, predictions out. In contrast to prior work, we study complex victim blackbox models, and an adversary lacking knowledge of train/test data used by the model, its internals, and semantics over model outputs. We formulate model functionality stealing as a two-step approach: (i) querying a set of input images to the blackbox model to obtain predictions; and (ii) training a ``knockoff'' with queried image-prediction pairs. We make multiple remarkable observations: (a) querying random images from a different distribution than that of the blackbox training data results in a well-performing knockoff; (b) this is possible even when the knockoff is represented using a different architecture; and (c) our reinforcement learning approach additionally improves query sample efficiency in certain settings and provides performance gains. We validate model functionality stealing on a range of datasets and tasks, as well as show that a reasonable knockoff of an image analysis API could be created for as little as $30.

[1] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4] Seong Joon Oh,et al. Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[5] Nikolaos Papanikolopoulos,et al. Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[8] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[9] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[10] Alex Lamb,et al. Deep Learning for Classical Japanese Literature , 2018, ArXiv.

[11] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[12] Dawn Xiaodong Song,et al. Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[13] Mark Craven,et al. An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[14] Rich Caruana,et al. Model compression , 2006, KDD '06.

[15] Samuel Marchal,et al. PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Isay Katsman,et al. Generative Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Bernt Schiele,et al. RALF: A reinforced active learning formulation for object class recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Somesh Jha,et al. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21] Luc Van Gool,et al. Natural and Effective Obfuscation by Head Inpainting , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Tony X. Han,et al. Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.

[25] Seong Joon Oh,et al. Faceless Person Recognition: Privacy Implications in Social Media , 2016, ECCV.

[26] Andreas Nürnberger,et al. The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Gregory Cohen,et al. EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[28] Mario Fritz,et al. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[29] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[30] Binghui Wang,et al. Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[31] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[32] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[33] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34] Tribhuvanesh Orekondy,et al. Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35] Seong Joon Oh,et al. Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[38] Jian Liu,et al. Defense Against Universal Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[40] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[41] Yoichi Sato,et al. Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42] Tribhuvanesh Orekondy,et al. Connecting Pixels to Privacy and Utility: Automatic Redaction of Private Information in Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[44] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[45] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[47] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[48] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[49] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[50] Valentin Khrulkov,et al. Art of Singular Vectors and Universal Adversarial Perturbations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[52] Huchuan Lu,et al. Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.