Dynamic inference of likely data preconditions over predicates by tree learning

We present a technique to infer likely data preconditions forprocedures written in an imperative programming language. Given a procedure and a set of predicates over its inputs, our technique enumerates different truth assignments to the predicates, deriving test cases from each feasible truth assignment. The predicates themselves are derived automatically using simple heuristics. The enumeration of truth assignments is performed using a propositional SAT solver along with a theory satisfiability checker capable of generating unsatisfiable cores. For each assignment of truth values, a corresponding set of test cases are generated and executed. Based on the result of the execution, the truth assignment is classified as being safe or buggy. Finally, a decision tree classifier is used to generate a Boolean formula over the input predicates that explains the data obtained from the test cases. The resulting Boolean formula is, in effect, a likely data precondition for the procedure under consideration. We apply our techniques on a wide variety of functions from the standard C library. Our experiments show that the proposed technique is quite robust. For most cases, it successfully learns a precondition that captures a safe and permissive calling environment.

[1]  Thomas Ball,et al.  A Theory of Predicate-Complete Test Coverage and Generation , 2004, FMCO.

[2]  Suresh Jagannathan,et al.  Static specification inference using predicate mining , 2007, PLDI '07.

[3]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[4]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[5]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[6]  Michael D. Ernst,et al.  Dynamically discovering likely program invariants , 2000 .

[7]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[8]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[9]  David Evans,et al.  Automatically Discovering Temporal Properties for Program Verification , 2005 .

[10]  Jian Pei,et al.  Mining API patterns as partial orders from source code: from usage scenarios to specifications , 2007, ESEC-FSE '07.

[11]  Vibhav Gogate,et al.  A New Algorithm for Sampling CSP Solutions Uniformly at Random , 2006, CP.

[12]  Matthew W. Moskewicz,et al.  Cha : Engineering an e cient SAT solver , 2001, DAC 2001.

[13]  Mana Taghdiri,et al.  Lightweight extraction of syntactic specifications , 2006, SIGSOFT '06/FSE-14.

[14]  Daniel Jackson,et al.  Finding bugs with a constraint solver , 2000, ISSTA '00.

[15]  Bart Selman,et al.  Towards Efficient Sampling: Exploiting Random Walk Strategies , 2004, AAAI.

[16]  Sriram Sankaranarayanan,et al.  Mining library specifications using inductive logic programming , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[18]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[19]  Pavol Cerný,et al.  Synthesis of interface specifications for Java classes , 2005, POPL '05.

[20]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[21]  Andy Chou,et al.  Bugs as Inconsistent Behavior: A General Approach to Inferring Errors in Systems Code. , 2001, SOSP 2001.

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[24]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[25]  Thomas A. Henzinger,et al.  Permissive interfaces , 2005, ESEC/FSE-13.

[26]  Monica S. Lam,et al.  Automatic extraction of object-oriented component interfaces , 2002, ISSTA '02.

[27]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[28]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[29]  George C. Necula,et al.  CCured: type-safe retrofitting of legacy code , 2002, POPL '02.

[30]  Yannick Moy,et al.  Sufficient Preconditions for Modular Assertion Checking , 2008, VMCAI.

[31]  Sarfraz Khurshid,et al.  Korat: automated testing based on Java predicates , 2002, ISSTA '02.

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  Zijiang Yang,et al.  F-Soft: Software Verification Platform , 2005, CAV.

[34]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[35]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[36]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[37]  Haifeng Chen,et al.  Multi-resolution Abnormal Trace Detection Using Varied-length N-grams and Automata , 2005, ICAC.