VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector

Automatically detecting software vulnerabilities is an important problem that has attracted much attention. However, existing vulnerability detectors still cannot achieve the vulnerability detection capability and locating precision that would warrant their adoption for real-world use. In this paper, we present Vulnerability Deep Learning-based Locator (VulDeeLocator), a deep learning-based fine-grained vulnerability detector, for C programs with source code. VulDeeLocator advances the state-of-the-art by simultaneously achieving a high detection capability and a high locating precision. When applied to three real-world software products, VulDeeLocator detects four vulnerabilities that are not reported in the National Vulnerability Database (NVD); among these four vulnerabilities, three are not known to exist in these products until now, but the other one has been "silently" patched by the vendor when releasing newer versions of the vulnerable product. The core innovations underlying VulDeeLocator are (i) the leverage of intermediate code to accommodate semantic information that cannot be conveyed by source code-based representations, and (ii) the concept of granularity refinement for precisely pinning down locations of vulnerabilities.

[1]  Kai Wang,et al.  A static technique for detecting input validation vulnerabilities in Android apps , 2015, Science China Information Sciences.

[2]  Peter Müller,et al.  Guiding Dynamic Symbolic Execution toward Unverified Program Executions , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[3]  Yan Xiao,et al.  Bug Localization with Semantic and Structural Features using Convolutional Neural Network and Cascade Forest , 2018, EASE.

[4]  Yu Jiang,et al.  LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment Through Program Metrics , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Sang Peter Chin,et al.  Automated software vulnerability detection with machine learning , 2018, ArXiv.

[6]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[7]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[8]  Wei Luo,et al.  Cross-Project Transfer Representation Learning for Vulnerable Function Discovery , 2018, IEEE Transactions on Industrial Informatics.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[11]  Guillermo L. Grinblat,et al.  Toward Large-Scale Vulnerability Discovery using Machine Learning , 2016, CODASPY.

[12]  Fabian Yamaguchi,et al.  Pattern-Based Vulnerability Discovery , 2015 .

[13]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[14]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[15]  Konrad Rieck,et al.  Automatic Inference of Search Patterns for Taint-Style Vulnerabilities , 2015, 2015 IEEE Symposium on Security and Privacy.

[16]  Onur Ozdemir,et al.  Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[17]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[18]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[19]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[20]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[21]  Konrad Rieck,et al.  Chucky: exposing missing checks in source code for vulnerability discovery , 2013, CCS.

[22]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[23]  Jun Zhang,et al.  POSTER: Vulnerability Discovery with Function Representation Learning from Unlabeled Projects , 2017, CCS.

[24]  Shouhuai Xu,et al.  VulPecker: an automated vulnerability detection system based on code similarity analysis , 2016, ACSAC.

[25]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[26]  David Lo,et al.  BugLocalizer: integrated tool support for bug localization , 2014, SIGSOFT FSE.

[27]  Lei Wang,et al.  MLSA: A static bugs analysis tool based on LLVM IR , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[28]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[29]  Marcus Pendleton,et al.  A Survey on Systems Security Metrics , 2016, ACM Comput. Surv..

[30]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Kai Zhang,et al.  How security bugs are fixed and what can be improved: an empirical study with Mozilla , 2018, Science China Information Sciences.

[33]  Shouhuai Xu,et al.  SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities , 2018, IEEE Transactions on Dependable and Secure Computing.

[34]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[35]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[36]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[37]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[38]  Tim Miller,et al.  Leveraging abstract interpretation for efficient dynamic symbolic execution , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).