Discovering Vulnerable Functions by Extrapolation: A Control-Flow Graph Similarity Based Approach

We present a method for vulnerability extrapolation to identify vulnerable functions in source code. Given a known vulnerable function, the proposed method extrapolates to find similar functions in the code base. Vulnerability extrapolation is based on the observation that given a starting vulnerability, similar behavior may be present in many other functions. In order to capture similarity, we represent functions in terms of syntactic and semantic patterns. These patterns are based on several code features like API usage pattern, argument types and control flow graph (CFG) of the functions. We employ a recent technique, called graph kernel to compute similarity directly on the CFGs of functions. We empirically demonstrate the capabilities of the proposed method by evaluating real-world applications to identify vulnerabilities.

[1]  Martin Schäf,et al.  Detecting Similar Programs via The Weisfeiler-Leman Graph Kernel , 2016, ICSR.

[2]  Konrad Rieck,et al.  Detecting Unknown Network Attacks Using Language Models , 2006, DIMVA.

[3]  Sam Ransbotham,et al.  An Empirical Analysis of Exploitation Attempts Based on Vulnerabilities in Open Source Software , 2010, WEIS.

[4]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[5]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[6]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[7]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[8]  Barbara G. Ryder,et al.  Probabilistic Program Modeling for High-Precision Anomaly Classification , 2015, 2015 IEEE 28th Computer Security Foundations Symposium.

[9]  Johann Blieberger,et al.  A Framework for CFG-Based Static Program Analysis of Ada Programs , 2008, Ada-Europe.

[10]  Jianzhong Li,et al.  Graph homomorphism revisited for graph matching , 2010, Proc. VLDB Endow..

[11]  Karsten M. Borgwardt,et al.  Halting in Random Walk Kernels , 2015, NIPS.

[12]  Felix FX Lindner,et al.  Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning , 2011, WOOT.

[13]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[14]  Michael W. Godfrey,et al.  Toward a Taxonomy of Clones in Source Code: A Case Study , 2003 .

[15]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[16]  Kim Henrick,et al.  Common subgraph isomorphism detection by backtracking search , 2004, Softw. Pract. Exp..

[17]  Sanjay Rawat,et al.  Finding Buffer Overflow Inducing Loops in Binary Executables , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability.

[18]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.