Evading Black-box Classifiers Without Breaking Eggs

Decision-based evasion attacks repeatedly query a black-box classifier to generate adversarial examples. Prior work measures the cost of such attacks by the total number of queries made to the classifier. We argue this metric is flawed. Most security-critical machine learning systems aim to weed out"bad"data (e.g., malware, harmful content, etc). Queries to such systems carry a fundamentally asymmetric cost: queries detected as"bad"come at a higher cost because they trigger additional security filters, e.g., usage throttling or account suspension. Yet, we find that existing decision-based attacks issue a large number of"bad"queries, which likely renders them ineffective against security-critical systems. We then design new attacks that reduce the number of bad queries by $1.5$-$7.3\times$, but often at a significant increase in total (non-bad) queries. We thus pose it as an open problem to build black-box attacks that are more effective under realistic cost metrics.

[1]  Kevin A. Roundy,et al.  “Real Attackers Don't Compute Gradients”: Bridging the Gap Between Adversarial ML Research and Practice , 2022, 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML).

[2]  Ludwig Schmidt,et al.  LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.

[3]  Florian Tramèr,et al.  Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems , 2022, ArXiv.

[4]  Lior Rokach,et al.  Adversarial Machine Learning Attacks and Defense Methods in the Cyber Security Domain , 2021, ACM Comput. Surv..

[5]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[6]  Ben Y. Zhao,et al.  Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks , 2020, USENIX Security Symposium.

[7]  Quanquan Gu,et al.  RayS: A Ray Searching Method for Hard-label Adversarial Attack , 2020, Knowledge Discovery and Data Mining.

[8]  Shuang Yang,et al.  QEBA: Query-Efficient Boundary-Based Blackbox Attack , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Florian Tramèr,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[10]  Maryam Kamgarpour,et al.  Safe non-smooth black-box optimization with application to policy search , 2019, L4DC.

[11]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[12]  Dan Boneh,et al.  AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning , 2019, CCS.

[13]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[14]  Patrick H. Chen,et al.  Sign-OPT: A Query-Efficient Hard-label Adversarial Attack , 2019, ICLR.

[15]  Nicholas Carlini,et al.  Stateful Detection of Black-Box Adversarial Attacks , 2019, Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence.

[16]  Tom Goldstein,et al.  Adversarial attacks on Copyright Detection Systems , 2019, ICML.

[17]  Di Tang,et al.  Stealthy Porn: Understanding Real-World Adversarial Images for Illicit Online Promotion , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[18]  Michael I. Jordan,et al.  HopSkipJumpAttack: A Query-Efficient Decision-Based Attack , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[19]  Ryan P. Adams,et al.  Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[20]  Jinfeng Yi,et al.  Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach , 2018, ICLR.

[21]  Shiyu Chang,et al.  Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.

[22]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[23]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[27]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[28]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Hang Su,et al.  AutoDA: Automated Decision-based Iterative Adversarial Attacks , 2022, USENIX Security Symposium.

[30]  Tushar M. Jois,et al.  Squint Hard Enough: Attacking Perceptual Hashing with Adversarial Machine Learning , 2023, USENIX Security Symposium.

[31]  Vinay Uday Prabhu L ARGE DATASETS : A P YRRHIC WIN FOR COMPUTER VISION ? , 2020 .

[32]  W. Brendel,et al.  Foolbox: A Python toolbox to benchmark the robustness of machine learning models , 2017 .