A Survey of Explainable Graph Neural Networks for Cyber Malware Analysis

Malicious cybersecurity activities have become increasingly worrisome for individuals and companies alike. While machine learning methods like Graph Neural Networks (GNNs) have proven successful on the malware detection task, their output is often difficult to understand. Explainable malware detection methods are needed to automatically identify malicious programs and present results to malware analysts in a way that is human interpretable. In this survey, we outline a number of GNN explainability methods and compare their performance on a real-world malware detection dataset. Specifically, we formulated the detection problem as a graph classification problem on the malware Control Flow Graphs (CFGs). We find that gradient-based methods outperform perturbation-based methods in terms of computational expense and performance on explainer-specific metrics (e.g., Fidelity and Sparsity). Our results provide insights into designing new GNN-based models for cyber malware detection and attribution.

[1]  Shuiwang Ji,et al.  Explainability in Graph Neural Networks: A Taxonomic Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianlong Zhou,et al.  A Survey of Explainable Graph Neural Networks: Taxonomy and Evaluation Metrics , 2022, ArXiv.

[3]  Guanhua Yan,et al.  CFGExplainer: Explaining Graph Neural Network-Based Malware Classification from Control Flow Graphs , 2022, 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[4]  M. Pagnucco,et al.  Explainability in Graph Neural Networks: An Experimental Survey , 2022, ArXiv.

[5]  Te-En Wei,et al.  Explainable Malware Detection Using Predefined Network Flow , 2022, 2022 24th International Conference on Advanced Communication Technology (ICACT).

[6]  Dongrui Zeng,et al.  DeepCatra: Learning Flow- and Graph-based Behaviors for Android Malware Detection , 2022, IET Inf. Secur..

[7]  A. Shelupanov,et al.  A review of artificial intelligence based malware detection using deep learning , 2021, Materials Today: Proceedings.

[8]  Ibrahim Sogukpinar,et al.  Graph-Based Malware Detection Using Opcode Sequences , 2021, 2021 9th International Symposium on Digital Forensics and Security (ISDFS).

[9]  Hairong Dong,et al.  Spectral-Based Directed Graph Network for Malware Detection , 2021, IEEE Transactions on Network Science and Engineering.

[10]  Shuiwang Ji,et al.  DIG: A Turnkey Library for Diving into Graph Deep Learning Research , 2021, J. Mach. Learn. Res..

[11]  Volker Tresp,et al.  NF-GNN: Network Flow Graph Neural Networks for Malware Detection and Classification , 2021, SSDBM.

[12]  Niall McLaughlin,et al.  Towards Explainable CNNs for Android Malware Detection , 2021, ANT/EDI40.

[13]  Bo Zong,et al.  Parameterized Explainer for Graph Neural Network , 2020, NeurIPS.

[14]  Ahmed A. Abusnaina,et al.  Soteria: Detecting Adversarial Examples in Control Flow Graph-based Malware Classifiers , 2020, IEEE International Conference on Distributed Computing Systems.

[15]  Si Zhang,et al.  Graph convolutional networks: a comprehensive review , 2019, Computational Social Networks.

[16]  Dong Jin,et al.  Classifying Malware Represented as Control Flow Graphs using Deep Graph Convolutional Neural Network , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Heiko Hoffmann,et al.  Explainability Methods for Graph Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  J. Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[19]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[20]  Jason Tsong-Li Wang,et al.  DLGraph: Malware Detection Using Deep Learning and Graph Embedding , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[21]  Ali A. Ghorbani,et al.  Toward Developing a Systematic Approach to Generate Benchmark Android Malware Datasets and Classification , 2018, 2018 International Carnahan Conference on Security Technology (ICCST).

[22]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[23]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[24]  Philip K. Chan,et al.  Scalable Function Call Graph-based Malware Classification , 2017, CODASPY.

[25]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[27]  Lior Rokach,et al.  Unknown malware detection using network traffic classification , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[28]  Guanhua Yan,et al.  Be Sensitive to Your Errors: Chaining Neyman-Pearson Criteria for Automated Malware Classification , 2015, AsiaCCS.

[29]  Frances E. Allen,et al.  Control-flow analysis , 2022 .