Discovering Vulnerable Functions: A Code Similarity Based Approach

This paper extends recent work on vulnerability extrapolation. A surge in vulnerability exploits against old and new softwares, urges the importance of detection of vulnerabilities and possible attacks prior to the attacker. How sophisticated an exploit may be, an underlying prerequisite remains to be the presence of at least one memory corruption bug, serving as entry point for the exploit. Therefore several rigorous software testing techniques are borrowed to detect and eliminate software bugs as early as possible. Code similarity based bug detection is one of such techniques, which, in the parlance of software security, is also termed as vulnerability extrapolation. In this paper, we present a source code similarity based bug identification technique by considering code features that are relevant for security related bugs. Our technique works by enriching (augmenting) abstract syntax trees (ASTs) of functions by considering security relevant properties of the code. We show the effectiveness of the augmented AST based similarity approach over existing methods by evaluating proposed method on real-world applications.

[1]  Pedram Amini,et al.  Fuzzing: Brute Force Vulnerability Discovery , 2007 .

[2]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[3]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[4]  Sam Ransbotham,et al.  An Empirical Analysis of Exploitation Attempts Based on Vulnerabilities in Open Source Software , 2010, WEIS.

[5]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[6]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[7]  David Evans,et al.  Improving Security Using Extensible Lightweight Static Analysis , 2002, IEEE Softw..

[8]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[9]  Sean Heelan Vulnerability Detection Systems: Think Cyborg, Not Robot , 2011, IEEE Security & Privacy.

[10]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[11]  Michael W. Godfrey,et al.  Toward a Taxonomy of Clones in Source Code: A Case Study , 2003 .

[12]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[13]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[14]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[15]  Sanjay Rawat,et al.  Finding Buffer Overflow Inducing Loops in Binary Executables , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[16]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.