Adaptive Static Analysis via Learning with Bayesian Optimization

Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.

[1]  Swarat Chaudhuri,et al.  Dynamic inference of likely data preconditions over predicates by tree learning , 2008, ISSTA '08.

[2]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[3]  Thomas A. Henzinger,et al.  Abstractions from proofs , 2004, POPL.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Thomas A. Henzinger,et al.  Software Verification with BLAST , 2003, SPIN.

[6]  Mayur Naik,et al.  Learning minimal abstractions , 2011, POPL '11.

[7]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[8]  J. Doye,et al.  Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms , 1997, cond-mat/9803344.

[9]  Yu-Chi Ho,et al.  An Explanation of Ordinal Optimization: Soft Computing for hard Problems , 1999, Inf. Sci..

[10]  Xin Zhang,et al.  On abstraction refinement for program analyses in Datalog , 2014, PLDI 2014.

[11]  Yannis Smaragdakis,et al.  Introspective analysis: context-sensitivity, across the board , 2014, PLDI.

[12]  Rupak Majumdar,et al.  From Tests to Proofs , 2009, TACAS.

[13]  Eran Yahav,et al.  Typestate-based semantic code search over partial programs , 2012, OOPSLA '12.

[14]  Supratik Chakraborty,et al.  Automatically Refining Abstract Interpretations , 2008, TACAS.

[15]  Xin Zhang,et al.  Finding optimum abstractions in parametric dataflow analysis , 2013, PLDI 2013.

[16]  Dawson R. Engler,et al.  A Factor Graph Model for Software Bug Finding , 2007, IJCAI.

[17]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[18]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[19]  Alex Groce,et al.  Modular verification of software components in C , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[20]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[21]  Hongseok Yang,et al.  Learning a strategy for adapting a program analysis via bayesian optimisation , 2015, OOPSLA.

[22]  Sriram Sankaranarayanan,et al.  Mining library specifications using inductive logic programming , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[23]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[24]  Aditya V. Nori,et al.  Probabilistic, modular and scalable inference of typestate specifications , 2011, PLDI '11.

[25]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[26]  Hongseok Yang,et al.  Selective context-sensitivity guided by impact pre-analysis , 2014, PLDI.

[27]  Alexander Aiken,et al.  Interpolants as Classifiers , 2012, CAV.

[28]  Rahul Sharma,et al.  Termination proofs from tests , 2013, ESEC/FSE 2013.

[29]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[30]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[31]  Alexander Aiken,et al.  Verification as Learning Geometric Concepts , 2013, SAS.

[32]  Alexander Aiken,et al.  A Data Driven Approach for Algebraic Loop Invariants , 2013, ESOP.

[33]  Ashutosh Gupta,et al.  HSF(C): A Software Verifier Based on Horn Clauses - (Competition Contribution) , 2012, TACAS.

[34]  L. Dai Convergence properties of ordinal comparison in the simulation of discrete event dynamic systems , 1995 .

[35]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[36]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[37]  Hongseok Yang,et al.  Abstractions from tests , 2012, POPL '12.

[38]  Hakjoo Oh,et al.  Design and implementation of sparse global analyses for C-like languages , 2012, PLDI.

[39]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.