Self-Attention based Automated Vulnerability Detection with Effective Data Representation

Vulnerability detection is an important means to protect computer software systems from network attacks and ensure data security. Automatic vulnerability detection by machine learning has become a research hotspot in recent years. The emergence of deep learning technology reduces human experts’ boring and arduous work in defining vulnerability features, which obtains advanced features that human experts can not define intuitively. Among many neural networks, Recurrent Neural Network(RNN) is structurally more suitable for processing sequences, which achieved excellent results in vulnerability detection. In 2017, Transformer is proposed in the field of Natural Language Processing(NLP), which is based on Self-Attention mechanism, replaces traditional RNN in the way of text sequence processing, and is more effective than RNN in many natural language tasks. This paper proposes using Transformer to automatically detect vulnerabilities in Code Slices. Firstly, we extract Code Slices that are finer than the functional level, which can express the vulnerability patterns more accurately. Secondly, we propose an effective data representation method to retain more semantic information. Finally, the experiment proves that Transformer is superior to models based on RNN in terms of comprehensive performance, and the effective data representation can significantly improve the detection effect of deep neural networks.