论文信息 - HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

HopSkipJumpAttack: A Query-Efficient Decision-Based Attack

The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate of the gradient direction using binary information at the decision boundary. The proposed family includes both untargeted and targeted attacks optimized for ℓ and ℓ∞ similarity metrics respectively. Theoretical analysis is provided for the proposed algorithms and the gradient direction estimate. Experiments show HopSkipJumpAttack requires significantly fewer model queries than several state-of-the-art decision-based adversarial attacks. It also achieves competitive performance in attacking several widely-used defense mechanisms.

Michael I. Jordan | Jianbo Chen | Jianbo Chen

[1] Xiaoyu Cao,et al. Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification , 2017, ACSAC.

[2] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jinfeng Yi,et al. Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach , 2018, ICLR.

[4] Shin Ishii,et al. Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[5] Kamyar Azizzadenesheli,et al. Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[6] Jan A. Snyman,et al. Practical Mathematical Optimization , 2018 .

[7] Matthias Bethge,et al. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[8] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[9] Patrick D. McDaniel,et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[10] M. Ledoux. The concentration of measure phenomenon , 2001 .

[11] Shiyu Chang,et al. Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[12] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[14] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[15] Cho-Jui Hsieh,et al. Towards Robust Neural Networks via Random Self-ensemble , 2017, ECCV.

[16] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[17] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[19] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[20] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Ananthram Swami,et al. The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[22] Logan Engstrom,et al. Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[23] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[24] W. Brendel,et al. Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .

[25] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[26] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27] J. Zico Kolter,et al. Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[28] Martin J. Wainwright,et al. Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[29] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[30] Alois Knoll,et al. Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] W. Cheney,et al. Numerical Analysis: Mathematics of Scientific Computing , 1991 .

[32] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[33] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[34] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[35] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[36] Alan L. Yuille,et al. Mitigating adversarial effects through randomization , 2017, ICLR.

[37] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[38] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[39] Jinfeng Yi,et al. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[40] Aleksander Madry,et al. Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[41] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[42] Ian J. Goodfellow,et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[43] Dawn Xiaodong Song,et al. Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[44] Yurii Nesterov,et al. Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[45] Jan Hendrik Metzen,et al. On Detecting Adversarial Perturbations , 2017, ICLR.