Vulnerability detection with fine-grained interpretations

Despite the successes of machine learning (ML) and deep learning (DL)-based vulnerability detectors (VD), they are limited to providing only the decision on whether a given code is vulnerable or not, without details on what part of the code is relevant to the detected vulnerability. We present IVDetect, an interpretable vulnerability detector with the philosophy of using Artificial Intelligence (AI) to detect vulnerabilities, while using Intelligence Assistant (IA) to provide VD interpretations in terms of vulnerable statements. For vulnerability detection, we separately consider the vulnerable statements and their surrounding contexts via data and control dependencies. This allows our model better discriminate vulnerable statements than using the mixture of vulnerable code and contextual code as in existing approaches. In addition to the coarse-grained vulnerability detection result, we leverage interpretable AI to provide users with fine-grained interpretations that include the sub-graph in the Program Dependency Graph (PDG) with the crucial statements that are relevant to the detected vulnerability. Our empirical evaluation on vulnerability databases shows that IVDetect outperforms the existing DL-based approaches by 43%–84% and 105%–255% in top-10 nDCG and MAP ranking scores. IVDetect correctly points out the vulnerable statements relevant to the vulnerability via its interpretation in 67% of the cases with a top-5 ranked list. IVDetect improves over the baseline interpretation models by 12.3%–400% and 9%–400% in accuracy.

[1]  Shaohua Wang,et al.  Improving bug detection via context-based code representation learning and attention-based neural networks , 2019, Proc. ACM Program. Lang..

[2]  Onur Ozdemir,et al.  Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[3]  Shangqing Liu,et al.  Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks , 2019, NeurIPS.

[4]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[5]  Shouhuai Xu,et al.  SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities , 2018, IEEE Transactions on Dependable and Secure Computing.

[6]  Lars Grunske,et al.  A Critical Evaluation of Spectrum-Based Fault Localization Techniques on a Large-Scale Software System , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[7]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[8]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[9]  Sang Peter Chin,et al.  Learning to Repair Software Vulnerabilities with Generative Adversarial Networks , 2018, NeurIPS.

[10]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[11]  Shaohua Wang,et al.  A C/C++ Code Vulnerability Dataset with Code Changes and CVE Summaries , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[12]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[13]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[14]  Sang Peter Chin,et al.  Automated software vulnerability detection with machine learning , 2018, ArXiv.

[15]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[16]  Son Nguyen,et al.  Suggesting Natural Method Names to Check Name Consistencies , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[17]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[18]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[19]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[20]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[21]  Baishakhi Ray,et al.  Deep Learning Based Vulnerability Detection: Are We There Yet? , 2020, IEEE Transactions on Software Engineering.

[22]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[25]  Jianxun Liu,et al.  Feature-Attention Graph Convolutional Networks for Noise Resilient Learning , 2019, ArXiv.

[26]  Thomas Zimmermann,et al.  The Beauty and the Beast: Vulnerabilities in Red Hat's Packages , 2009, USENIX Annual Technical Conference.