User-guided program reasoning using Bayesian inference

Program analyses necessarily make approximations that often lead them to report true alarms interspersed with many false alarms. We propose a new approach to leverage user feedback to guide program analyses towards true alarms and away from false alarms. Our approach associates each alarm with a confidence value by performing Bayesian inference on a probabilistic model derived from the analysis rules. In each iteration, the user inspects the alarm with the highest confidence and labels its ground truth, and the approach recomputes the confidences of the remaining alarms given this feedback. It thereby maximizes the return on the effort by the user in inspecting each alarm. We have implemented our approach in a tool named Bingo for program analyses expressed in Datalog. Experiments with real users and two sophisticated analyses---a static datarace analysis for Java programs and a static taint analysis for Android apps---show significant improvements on a range of metrics, including false alarm rates and number of bugs found.

[1]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[2]  Thomas W. Reps,et al.  Demand Interprocedural Program Analysis Using Logic Databases , 1993, Workshop on Programming with Logic Databases , ILPS.

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[5]  Adam A. Porter,et al.  Learning a classifier for false positive error reports emitted by static code analysis tools , 2017, MAPL@PLDI.

[6]  Mira Mezini,et al.  Taming reflection: Aiding static analysis in the presence of reflection and custom class loaders , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[7]  Ondrej Lhoták,et al.  From Datalog to flix: a declarative language for fixed points on lattices , 2016, PLDI.

[8]  Xin Zhang,et al.  Effective interactive resolution of static analysis alarms , 2017, Proc. ACM Program. Lang..

[9]  Hongseok Yang,et al.  Automatically generating features for learning program analysis heuristics for C-like languages , 2017, Proc. ACM Program. Lang..

[10]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[11]  Xin Zhang,et al.  A user-guided approach to program analysis , 2015, ESEC/SIGSOFT FSE.

[12]  Marco Pistoia,et al.  ALETHEIA: Improving the Usability of Static Security Analysis , 2014, CCS.

[13]  Timon Gehr,et al.  PSI: Exact Symbolic Inference for Probabilistic Programs , 2016, CAV.

[14]  Junfeng Yang,et al.  Correlation exploitation in error ranking , 2004, SIGSOFT '04/FSE-12.

[15]  Luc De Raedt,et al.  Inference and learning in probabilistic logic programs using weighted Boolean formulas , 2013, Theory and Practice of Logic Programming.

[16]  Dawson R. Engler,et al.  A Factor Graph Model for Software Bug Finding , 2007, IJCAI.

[17]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[18]  Jens Palsberg,et al.  Race directed scheduling of concurrent programs , 2014, PPoPP '14.

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[20]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[21]  Monica S. Lam,et al.  Using Datalog with Binary Decision Diagrams for Program Analysis , 2005, APLAS.

[22]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[23]  Ben Taskar,et al.  Probabilistic Relational Models , 2014, Encyclopedia of Social Network Analysis and Mining.

[24]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[25]  Mary Lou Soffa,et al.  Path-based fault correlations , 2010, FSE '10.

[26]  Isil Dillig,et al.  Automated error diagnosis using abductive inference , 2012, PLDI.

[27]  Stephen N. Freund,et al.  FastTrack: efficient and precise dynamic race detection , 2009, PLDI '09.

[28]  Yannis Smaragdakis,et al.  Strictly declarative specification of sophisticated points-to analyses , 2009, OOPSLA.

[29]  Michael Arntzenius,et al.  Datafun: a functional Datalog , 2016, ICFP.

[30]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[31]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[32]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[33]  Ondrej Lhoták,et al.  In defense of soundiness , 2015, Commun. ACM.

[34]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[35]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[36]  Hakjoo Oh,et al.  Machine-Learning-Guided Selectively Unsound Static Analysis , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[37]  Barbara G. Ryder,et al.  Parameterized object sensitivity for points-to analysis for Java , 2005, TSEM.

[38]  Swarat Chaudhuri,et al.  Bayesian specification learning for finding API usage errors , 2017, ESEC/SIGSOFT FSE.

[39]  Sam Blackshear,et al.  Almost-correct specifications: a modular semantic framework for assigning confidence to warnings , 2013, PLDI 2013.

[40]  Kwangkeun Yi,et al.  Sound Non-statistical Clustering of Static Analysis Alarms , 2012, VMCAI.

[41]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[42]  Norbert Fuhr,et al.  Probabilistic Datalog—a logic for powerful retrieval methods , 1995, SIGIR '95.

[43]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[44]  Isil Dillig,et al.  Apposcopy: semantics-based detection of Android malware through static analysis , 2014, SIGSOFT FSE.

[45]  Kwangkeun Yi,et al.  Taming False Alarms from a Domain-Unaware C Analyzer by a Bayesian Statistical Post Analysis , 2005, SAS.

[46]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[47]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[48]  Kenneth L. McMillan,et al.  Ivy: safety verification by interactive generalization , 2016, PLDI.

[49]  Luc De Raedt,et al.  Bayesian Logic Programming: Theory and Tool , 2007 .

[50]  Sanjit A. Seshia,et al.  Distribution-Aware Sampling and Weighted Model Counting for SAT , 2014, AAAI.

[51]  Radu Grigore,et al.  Abstraction refinement guided by a learnt probabilistic model , 2015, POPL.

[52]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[53]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[54]  Xin Zhang,et al.  Accelerating program analyses by cross-program training , 2016, OOPSLA.

[55]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[56]  Michael Peyton Jones,et al.  QL: Object-oriented Queries on Relational Data , 2016, ECOOP.