Can We Trust Your Explanations? Sanity Checks for Interpreters in Android Malware Analysis

With the rapid growth of Android malware, many machine learning-based malware analysis approaches are proposed to mitigate the severe phenomenon. However, such classifiers are opaque, non-intuitive, and difficult for analysts to understand the inner decision reason. For this reason, a variety of explanation approaches are proposed to interpret predictions by providing important features. Unfortunately, the explanation results obtained in the malware analysis domain cannot achieve a consensus in general, which makes the analysts confused about whether they can trust such results. In this work, we propose principled guidelines to assess the quality of five explanation approaches by designing three critical quantitative metrics to measure their stability, robustness, and effectiveness. Furthermore, we collect five widely-used malware datasets and apply the explanation approaches on them in two tasks, including malware detection and familial identification. Based on the generated explanation results, we conduct a sanity check of such explanation approaches in terms of the three metrics. The results demonstrate that our metrics can assess the explanation approaches and help us obtain the knowledge of most typical malicious behaviors for malware analysis.

[1]  Jun Liu,et al.  CTDroid: Leveraging a Corpus of Technical Blogs for Android Malware Analysis , 2020, IEEE Transactions on Reliability.

[2]  Ming Fan,et al.  DAPASA: Detecting Android Piggybacked Apps Through Sensitive Subgraph Analysis , 2017, IEEE Transactions on Information Forensics and Security.

[3]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[4]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[5]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[6]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[8]  Ting Wang,et al.  Interpretable Deep Learning under Fire , 2018, USENIX Security Symposium.

[9]  Hao Li,et al.  RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[10]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[11]  Qinghua Zheng,et al.  Frequent Subgraph Based Familial Classification of Android Malware , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[12]  Franco Turini,et al.  Local Rule-Based Explanations of Black Box Decision Systems , 2018, ArXiv.

[13]  Qinghua Zheng,et al.  Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Haijun Wang,et al.  DiffChaser: Detecting Disagreements for Deep Neural Networks , 2019, IJCAI.

[16]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[17]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[18]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[19]  Chris Russell,et al.  Explaining Explanations in AI , 2018, FAT.

[20]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[21]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[22]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[23]  Taesup Moon,et al.  Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[24]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[25]  Qinghua Zheng,et al.  Android Malware Familial Classification and Representative Sample Selection via Frequent Subgraph Analysis , 2018, IEEE Transactions on Information Forensics and Security.

[26]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[27]  Jacques Klein,et al.  An Investigation into the Use of Common Libraries in Android Apps , 2015, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  Thomas Lukasiewicz,et al.  Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods , 2019, ArXiv.

[29]  Anu Mary Chacko,et al.  Android malware detection a survey , 2017, 2017 IEEE International Conference on Circuits and Systems (ICCS).

[30]  Jian Liu,et al.  LibD: Scalable and Precise Third-Party Library Detection in Android Markets , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[31]  Sameer Singh,et al.  How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods , 2019, ArXiv.

[32]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[33]  Hongxia Yang,et al.  Adversarial Detection with Model Interpretation , 2018, KDD.

[34]  Patrick D. McDaniel,et al.  Adversarial Examples for Malware Detection , 2017, ESORICS.

[35]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[36]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[37]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[38]  Isil Dillig,et al.  Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability , 2016, NDSS.

[39]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[40]  Mu Zhang,et al.  Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs , 2014, CCS.

[41]  Jianjun Zhao,et al.  DeepStellar: model-based quantitative analysis of stateful deep learning systems , 2019, ESEC/SIGSOFT FSE.

[42]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[43]  Chih-Kuan Yeh,et al.  On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.

[44]  Tudor Dumitras,et al.  FeatureSmith: Automatically Engineering Features for Malware Detection by Mining the Security Literature , 2016, CCS.

[45]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.