You Only Query Once: Effective Black Box Adversarial Attacks with Minimal Repeated Queries

Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the “blackbox” setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thousands of times or more, making it easy for defenders to detect an ensuing attack. In this work, we instead show that it is possible to craft (universal) adversarial perturbations in the black-box setting by querying a sequence of different images only once. This attack prevents detection from high number of similar queries and produces a perturbation that causes misclassification when applied to any input to the classifier. In experiments, we show that attacks that adhere to this restriction can produce untargeted adversarial perturbations that fool the vast majority of MNIST and CIFAR-10 classifier inputs, as well as in excess of 60 − 70% of inputs on ImageNet classifiers. In the targeted setting, we exhibit targeted black-box universal attacks on ImageNet classifiers with success rates above 20% when only allowed one query per image, and 66% when allowed two queries per image.

[1]  Nikhil Agrawal,et al.  Universal Adversarial Perturbations: A Survey , 2020, ArXiv.

[2]  J. Zico Kolter,et al.  Adversarial camera stickers: A physical camera-based attack on deep learning systems , 2019, ICML.

[3]  George Danezis,et al.  Learning Universal Adversarial Perturbations with Generative Models , 2017, 2018 IEEE Security and Privacy Workshops (SPW).

[4]  Marcus A. Brubaker,et al.  On the Effectiveness of Low Frequency Perturbations , 2019, IJCAI.

[5]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[6]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kazuhiro Takemoto,et al.  Simple iterative method for generating targeted universal adversarial perturbations , 2019, Algorithms.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dirk V. Arnold,et al.  Improving Evolution Strategies through Active Covariance Matrix Adaptation , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[10]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  R. Venkatesh Babu,et al.  NAG: Network for Adversary Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[16]  Hyun Oh Song,et al.  Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization , 2019, ICML.

[17]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[19]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[20]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[22]  Jinfeng Yi,et al.  AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks , 2018, AAAI.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[26]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[28]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[29]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[30]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[31]  Olivier Teytaud,et al.  Yet another but more efficient black-box adversarial attack: tiling and evolution strategies , 2019, ArXiv.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.