Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability

In recent years, the topic of explainable machine learning (ML) has been extensively researched. Up until now, this research focused on regular ML users use-cases such as debugging a ML model. This paper takes a different posture and show that adversaries can leverage explainable ML to bypass multi-feature types malware classifiers. Previous adversarial attacks against such classifiers only add new features and not modify existing ones to avoid harming the modified malware executable's functionality. Current attacks use a single algorithm that both selects which features to modify and modifies them blindly, treating all features the same. In this paper, we present a different approach. We split the adversarial example generation task into two parts: First we find the importance of all features for a specific sample using explainability algorithms, and then we conduct a feature-specific modification, feature-by-feature. In order to apply our attack in black-box scenarios, we introduce the concept of transferability of explainability, that is, applying explainability algorithms to different classifiers using different features subsets and trained on different datasets still result in a similar subset of important features. We conclude that explainability algorithms can be leveraged by adversaries and thus the advocates of training more interpretable classifiers should consider the trade-off of higher vulnerability of those classifiers to adversarial attacks.

[1]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[2]  Sebastiano Vigna,et al.  A Weighted Correlation Index for Rankings with Ties , 2014, WWW.

[3]  Ohad Amosy and Gal Chechik Using Explainabilty to Detect Adversarial Attacks , 2019 .

[4]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[5]  Hyrum S. Anderson,et al.  EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models , 2018, ArXiv.

[6]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[7]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[10]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[11]  Maghsoud Abbaspour,et al.  A static heuristic approach to detecting malware targets , 2015, Secur. Commun. Networks.

[12]  Xiangyu Zhang,et al.  Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples , 2018, NeurIPS.

[13]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[14]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[15]  Lior Rokach,et al.  Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers , 2017, RAID.

[16]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[17]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[18]  Lior Rokach,et al.  Query-Efficient Black-Box Attack Against Sequence-Based Malware Classifiers , 2020, ACSAC.

[19]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[20]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[21]  Konrad Rieck,et al.  Don't Paint It Black: White-Box Explanations for Deep Learning in Computer Security , 2019, ArXiv.

[22]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[23]  Asaf Shabtai,et al.  When Explainability Meets Adversarial Learning: Detecting Adversarial Examples using SHAP Signatures , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[24]  Nicholas Carlini,et al.  Is AmI (Attacks Meet Interpretability) Robust to Adversarial Examples? , 2019, ArXiv.

[25]  Yuval Elovici,et al.  Quantifying the resilience of machine learning classifiers used for cyber security , 2018, Expert Syst. Appl..

[26]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[27]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[28]  Hyrum S. Anderson,et al.  Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning , 2018, ArXiv.

[29]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[30]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[31]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[32]  Scott E. Coull,et al.  Exploring Adversarial Examples in Malware Detection , 2018, 2019 IEEE Security and Privacy Workshops (SPW).

[33]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[34]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[35]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.