This paper proposes a black box extraction attack model on pre-trained image classifiers to rebuild a functionally equivalent model with high similarity. Common model extraction attacks use a large number of training samples to feed the target classifier which is time-consuming with redundancy. The attack results have a high dependency on the selected training samples and the target model. The extracted model may only get part of crucial features because of inappropriate sample selection. To eliminate these uncertainties, we proposed the VAE-kdtree attack model which eliminates the high dependency between selected training samples and the target model. It can not only save redundant computation, but also extract critical boundaries more accurately in image classification. This VAE-kdtree model has shown to achieve around 90% similarity on MNIST and around 80% similarity on MNIST-Fashion with a target Convolutional Network Model and a target Support Vector Machine Model. The performance of this VAE-kdtree model could be further improved by adopting higher dimension space of the kdtree.
[1]
Kemal Davaslioglu,et al.
Generative Adversarial Networks for Black-Box API Attacks with Limited Training Data
,
2018,
2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).
[2]
Max Welling,et al.
Auto-Encoding Variational Bayes
,
2013,
ICLR.
[3]
Jon Louis Bentley,et al.
Multidimensional binary search trees used for associative searching
,
1975,
CACM.
[4]
Yi Shi,et al.
How to steal a machine learning classifier with deep learning
,
2017,
2017 IEEE International Symposium on Technologies for Homeland Security (HST).
[5]
Thomas M. Cover,et al.
Estimation by the nearest neighbor rule
,
1968,
IEEE Trans. Inf. Theory.
[6]
Ananthram Swami,et al.
Practical Black-Box Attacks against Machine Learning
,
2016,
AsiaCCS.