VulPecker: an automated vulnerability detection system based on code similarity analysis

Software vulnerabilities are the fundamental cause of many attacks. Even with rapid vulnerability patching, the problem is more complicated than it looks. One reason is that instances of the same vulnerability may exist in multiple software copies that are difficult to track in real life (e.g., different versions of libraries and applications). This calls for tools that can automatically search for vulnerable software with respect to a given vulnerability. In this paper, we move a step forward in this direction by presenting Vulnerability Pecker (VulPecker), a system for automatically detecting whether a piece of software source code contains a given vulnerability or not. The key insight underlying VulPecker is to leverage (i) a set of features that we define to characterize patches, and (ii) code-similarity algorithms that have been proposed for various purposes, while noting that no single code-similarity algorithm is effective for all kinds of vulnerabilities. Experiments show that VulPecker detects 40 vulnerabilities that are not published in the National Vulnerability Database (NVD). Among these vulnerabilities, 18 are not known for their existence and have yet to be confirmed by vendors at the time of writing (these vulnerabilities are "anonymized" in the present paper for ethical reasons), and the other 22 vulnerabilities have been "silently" patched by the vendors in the later releases of the vulnerable products.

[1]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[2]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[3]  Heejo Lee,et al.  A Scalable Approach for Vulnerability Discovery Based on Security Patches , 2014 .

[4]  Michael D. Ernst,et al.  CBCD: Cloned buggy code detector , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  Leyla Bilge,et al.  The Attack of the Clones: A Study of the Impact of Shared Code on Vulnerability Patching , 2015, 2015 IEEE Symposium on Security and Privacy.

[6]  Tevfik Bultan,et al.  Automated Test Generation from Vulnerability Signatures , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[7]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[8]  Giovanni Vigna,et al.  Static Detection of Vulnerabilities in x86 Executables , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Hao Wang,et al.  Towards automatic generation of vulnerability-based signatures , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[11]  Andrew Meneely,et al.  When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[12]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[13]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[14]  Hoan Anh Nguyen,et al.  Detection of recurring software vulnerabilities , 2010, ASE.

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[19]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[20]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[21]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[22]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[23]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[24]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[25]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[26]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[27]  Thierry Lavoie,et al.  Uncovering access control weaknesses and flaws with security-discordant software clones , 2013, ACSAC.

[28]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.