Delving into Data: Effectively Substitute Training for Black-box Attack

Deep models have shown their vulnerability when processing adversarial samples. As for the black-box attack, without access to the architecture and weights of the attacked model, training a substitute model for adversarial attacks has attracted wide attention. Previous substitute training approaches focus on stealing the knowledge of the target model based on real training data or synthetic data, without exploring what kind of data can further improve the transferability between the substitute and target models. In this paper, we propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process. More specifically, a diverse data generation module is proposed to synthesize large-scale data with wide distribution. And adversarial substitute training strategy is introduced to focus on the data distributed near the decision boundary. The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack. Extensive experiments demonstrate the efficacy of our method against state-of-the-art competitors under non-target and target at-tack settings. Detailed visualization and analysis are also provided to help understand the advantage of our method.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[3]  Jun Zhu,et al.  Boosting Adversarial Attacks with Momentum , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[6]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[7]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Wei Liu,et al.  Efficient Decision-Based Black-Box Adversarial Attacks on Face Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[11]  U Kang,et al.  Knowledge Extraction with No Observable Data , 2019, NeurIPS.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Yantao Lu,et al.  Hermes Attack: Steal DNN Models with Lossless Inference Accuracy , 2020, ArXiv.

[14]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[15]  Andrew Gordon Wilson,et al.  Simple Black-box Adversarial Attacks , 2019, ICML.

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[20]  Yipeng Liu,et al.  DaST: Data-Free Substitute Training for Adversarial Attacks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[23]  Derek Hoiem,et al.  Dreaming to Distill: Data-Free Knowledge Transfer via DeepInversion , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Jinfeng Yi,et al.  Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach , 2018, ICLR.

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Xiaolin Huang,et al.  Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[30]  Yong Yang,et al.  Transferable Adversarial Perturbations , 2018, ECCV.

[31]  Qiang Xu,et al.  Towards Imperceptible and Robust Adversarial Example Attacks against Neural Networks , 2018, AAAI.

[32]  Yahong Han,et al.  Curls & Whey: Boosting Black-Box Adversarial Attacks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[34]  Huichen Lihuichen DECISION-BASED ADVERSARIAL ATTACKS: RELIABLE ATTACKS AGAINST BLACK-BOX MACHINE LEARNING MODELS , 2017 .