论文信息 - ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection

ReGVD: Revisiting Graph Neural Networks for Vulnerability Detection

Identifying vulnerabilities in the source code is essential to protect the software systems from cyber security attacks. It, however, is also a challenging step that requires specialized expertise in security and code representation. Inspired by the successful applications of pre-trained programming language (PL) models such as CodeBERT and graph neural networks (GNNs), we propose ReGVD, a general and novel graph neural network-based model for vulnerability detection. In particular, ReGVD views a given source code as a flat sequence of tokens and then examines two effective methods of utilizing unique tokens and indexes respectively to construct a single graph as an input, wherein node features are initialized only by the embedding layer of a pre-trained PL model. Next, ReGVD leverages a practical advantage of residual connection among GNN layers and explores a beneficial mixture of graph-level sum and max poolings to return a graph embedding for the given source code. Experimental results demonstrate that ReGVD outperforms the existing state-of-the-art models and obtain the highest accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.

[1] Viet Hung Nguyen,et al. Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3] Onur Ozdemir,et al. Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4] Jure Leskovec,et al. Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[5] Jure Leskovec,et al. Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[7] Houfeng Wang,et al. Text Level Graph Neural Network for Text Classification , 2019, EMNLP.

[8] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yuan Luo,et al. Graph Convolutional Networks for Text Classification , 2018, AAAI.

[12] Ming Zhou,et al. GraphCodeBERT: Pre-training Code Representations with Data Flow , 2020, ICLR.

[13] Xiaocheng Feng,et al. CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, EMNLP.

[14] Shouhuai Xu,et al. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[15] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[16] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[17] Yufeng Zhang,et al. Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks , 2020, ACL.

[18] Pietro Liò,et al. Towards Sparse Hierarchical Graph Classifiers , 2018, ArXiv.

[19] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[21] Neel Sundaresan,et al. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation , 2021, NeurIPS Datasets and Benchmarks.

[22] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23] Shangqing Liu,et al. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks , 2019, NeurIPS.

[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25] Andreas Zeller,et al. Predicting vulnerable software components , 2007, CCS '07.

[26] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[27] Laurie A. Williams,et al. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.