Learning Invariants using Decision Trees

The problem of inferring an inductive invariant for verifying program safety can be formulated in terms of binary classification. This is a standard problem in machine learning: given a sample of good and bad points, one is asked to find a classifier that generalizes from the sample and separates the two sets. Here, the good points are the reachable states of the program, and the bad points are those that reach a safety property violation. Thus, a learned classifier is a candidate invariant. In this paper, we propose a new algorithm that uses decision trees to learn candidate invariants in the form of arbitrary Boolean combinations of numerical inequalities. We have used our algorithm to verify C programs taken from the literature. The algorithm is able to infer safe invariants for a range of challenging benchmarks and compares favorably to other ML-based invariant inference techniques. In particular, it scales well to large sample sets.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Thomas Ball,et al.  Testing, abstraction, theorem proving: better together! , 2006, ISSTA '06.

[3]  Alexander Aiken,et al.  From invariant checking to invariant inference using randomized search , 2014, Formal Methods Syst. Des..

[4]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[5]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[6]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[7]  Edmund M. Clarke,et al.  Counterexample-guided abstraction refinement , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[8]  Marsha Chechik,et al.  UFO: Verification with Interpolants and Abstract Interpretation - (Competition Contribution) , 2013, TACAS.

[9]  Thomas A. Henzinger,et al.  SYNERGY: a new algorithm for property checking , 2006, SIGSOFT '06/FSE-14.

[10]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[11]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[12]  Bor-Yuh Evan Chang,et al.  Boogie: A Modular Reusable Verifier for Object-Oriented Programs , 2005, FMCO.

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Marsha Chechik,et al.  Craig Interpretation , 2012, SAS.

[15]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[16]  Hassen Saïdi,et al.  Construction of Abstract State Graphs with PVS , 1997, CAV.

[17]  Alexander Aiken,et al.  Verification as Learning Geometric Concepts , 2013, SAS.

[18]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[19]  Dirk Beyer,et al.  CPAchecker: A Tool for Configurable Software Verification , 2009, CAV.

[20]  Christof Löding,et al.  ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[21]  John P. Gallagher,et al.  Tree Automata-Based Refinement with Application to Horn Clause Verification , 2015, VMCAI.

[22]  Alexander Aiken,et al.  Interpolants as Classifiers , 2012, CAV.

[23]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[24]  Andreas Podelski,et al.  Boolean and Cartesian abstraction for model checking C programs , 2001, International Journal on Software Tools for Technology Transfer.

[25]  Sriram K. Rajamani,et al.  Compositional may-must program analysis: unleashing the power of alternation , 2010, POPL '10.

[26]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[27]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[28]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Swarat Chaudhuri,et al.  Dynamic inference of likely data preconditions over predicates by tree learning , 2008, ISSTA '08.

[31]  Ashutosh Gupta,et al.  InvGen: An Efficient Invariant Generator , 2009, CAV.

[32]  Andreas Podelski,et al.  Counterexample-guided focus , 2010, POPL '10.

[33]  Isil Dillig,et al.  Inductive invariant generation via abductive inference , 2013, OOPSLA.

[34]  Patrick Cousot,et al.  The ASTREÉ Analyzer , 2005, ESOP.