How to steal a machine learning classifier with deep learning

This paper presents an exploratory machine learning attack based on deep learning to infer the functionality of an arbitrary classifier by polling it as a black box, and using returned labels to build a functionally equivalent machine. Typically, it is costly and time consuming to build a classifier, because this requires collecting training data (e.g., through crowdsourcing), selecting a suitable machine learning algorithm (through extensive tests and using domain-specific knowledge), and optimizing the underlying hyperparameters (applying a good understanding of the classifier's structure). In addition, all this information is typically proprietary and should be protected. With the proposed black-box attack approach, an adversary can use deep learning to reliably infer the necessary information by using labels previously obtained from the classifier under attack, and build a functionally equivalent machine learning classifier without knowing the type, structure or underlying parameters of the original classifier. Results for a text classification application demonstrate that deep learning can infer Naive Bayes and SVM classifiers with high accuracy and steal their functionalities. This new attack paradigm with deep learning introduces additional security challenges for online machine learning algorithms and raises the need for novel mitigation strategies to counteract the high fidelity inference capability of deep learning.

[1]  Richard Lippmann,et al.  Machine learning in adversarial environments , 2010, Machine Learning.

[2]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[3]  Yalin E. Sagduyu,et al.  Defending active learning against adversarial inputs in automated document classification , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[4]  Blaine Nelson,et al.  Adversarial machine learning , 2019, AISec '11.

[5]  Prateek Mittal,et al.  LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships , 2016, NDSS.

[6]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[7]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[8]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[9]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[10]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[11]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[12]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[13]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[14]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.

[15]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.