Machine-Learning-Guided Selectively Unsound Static Analysis

We present a machine-learning-based technique for selectively applying unsoundness in static analysis. Existing bug-finding static analyzers are unsound in order to be precise and scalable in practice. However, they are uniformly unsound and hence at the risk of missing a large amount of real bugs. By being sound, we can improve the detectability of the analyzer but it often suffers from a large number of false alarms. Our approach aims to strike a balance between these two approaches by selectively allowing unsoundness only when it is likely to reduce false alarms, while retaining true alarms. We use an anomaly-detection technique to learn such harmless unsoundness. We implemented our technique in two static analyzers for full C. One is for a taint analysis for detecting format-string vulnerabilities, and the other is for an interval analysis for buffer-overflow detection. The experimental results show that our approach significantly improves the recall of the original unsound analysis without sacrificing the precision.

[1]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[2]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[3]  Yuanyuan Zhou,et al.  BugBench: Benchmarks for Evaluating Bug Detection Tools , 2005 .

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Xin Zhang,et al.  Finding optimum abstractions in parametric dataflow analysis , 2013, PLDI.

[6]  Alexander Aiken,et al.  Context- and path-sensitive memory leak detection , 2005, ESEC/FSE-13.

[7]  Dawson R. Engler,et al.  ARCHER: using symbolic, path-sensitive analysis to detect memory access errors , 2003, ESEC/FSE-11.

[8]  Hongseok Yang,et al.  Learning a strategy for adapting a program analysis via bayesian optimisation , 2015, OOPSLA.

[9]  Xin Zhang,et al.  A user-guided approach to program analysis , 2015, ESEC/SIGSOFT FSE.

[10]  Mayur Naik,et al.  Learning minimal abstractions , 2011, POPL '11.

[11]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[12]  Alexander Aiken,et al.  Saturn: A SAT-Based Tool for Bug Detection , 2005, CAV.

[13]  Junfeng Yang,et al.  Correlation exploitation in error ranking , 2004, SIGSOFT '04/FSE-12.

[14]  Richard Lippmann,et al.  Testing static analysis tools using exploitable buffer overflows from open source code , 2004, SIGSOFT '04/FSE-12.

[15]  Peter Müller,et al.  An Experimental Evaluation of Deliberate Unsoundness in a Static Program Analyzer , 2015, VMCAI.

[16]  Mark Lillibridge,et al.  Extended static checking for Java , 2002, PLDI '02.

[17]  Premkumar T. Devanbu,et al.  Delft University of Technology On the “ Naturalness ” of Buggy Code , 2017 .

[18]  Kwangkeun Yi,et al.  Taming False Alarms from a Domain-Unaware C Analyzer by a Bayesian Statistical Post Analysis , 2005, SAS.

[19]  Hongseok Yang,et al.  Abstractions from tests , 2012, POPL '12.

[20]  Hongseok Yang,et al.  Selective context-sensitivity guided by impact pre-analysis , 2014, PLDI.