Learning to Verify the Heap

We present a data-driven verification framework to automatically prove memory safety and functional correctness of heap programs. For this, we introduce a novel statistical machine learning technique that maps observed program states to (possibly disjunctive) separation logic formulas describing the invariant shape of (possibly nested) data structures at relevant program locations. We then attempt to verify these predictions using a theorem prover, where counterexamples to a predicted invariant are used as additional input to the shape predictor in a refinement loop. After obtaining valid shape invariants, we use a second learning algorithm to strengthen them with data invariants, again employing a refinement loop using the underlying theorem prover. We have implemented our techniques in Cricket, an extension of the GRASShopper verification tool. Cricket is able to automatically prove memory safety and correctness of implementations of a variety of classical heap-manipulating programs such as insertionsort, quicksort and traversals of nested data structures.

[1]  Alexander Aiken,et al.  A Data Driven Approach for Algebraic Loop Invariants , 2013, ESOP.

[2]  Alexander Aiken,et al.  Interpolants as Classifiers , 2012, CAV.

[3]  Thomas Ball,et al.  Testing, abstraction, theorem proving: better together! , 2006, ISSTA '06.

[4]  Alexander Aiken,et al.  From invariant checking to invariant inference using randomized search , 2014, Formal Methods Syst. Des..

[5]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[6]  Srinath T. V. Setty,et al.  IronFleet: proving practical distributed systems correct , 2015, SOSP.

[7]  Aws Albarghouthi,et al.  Spatial Interpolants , 2015, ESOP.

[8]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[9]  Alexey Gotsman,et al.  Interprocedural Shape Analysis with Separated Heap Abstractions , 2006, SAS.

[10]  Aaron R. Bradley,et al.  SAT-Based Model Checking without Unrolling , 2011, VMCAI.

[11]  James Brotherston,et al.  Automated Cyclic Entailment Proofs in Separation Logic , 2011, CADE.

[12]  Peter W. O'Hearn,et al.  Compositional Shape Analysis by Means of Bi-Abduction , 2011, JACM.

[13]  Alexander Aiken,et al.  Verification as Learning Geometric Concepts , 2013, SAS.

[14]  Constantin Enea,et al.  Abstract Domains for Automated Reasoning about List-Manipulating Programs with Infinite Data , 2012, VMCAI.

[15]  Gernot Heiser,et al.  Comprehensive formal verification of an OS microkernel , 2014, TOCS.

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[17]  Nikolaj Bjørner,et al.  Property-Directed Shape Analysis , 2014, CAV.

[18]  Yannick Moy,et al.  Modular inference of subprogram contracts for safety checking , 2010, J. Symb. Comput..

[19]  Andrey Rybalchenko,et al.  Separation Logic Modulo Theories , 2013, APLAS.

[20]  Dan Roth,et al.  Learning invariants using decision trees and implication counterexamples , 2016, POPL.

[21]  Peter W. O'Hearn,et al.  Compositional Shape Analysis by Means of Bi-Abduction , 2011, JACM.

[22]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[23]  Shengchao Qin,et al.  Shape Analysis via Second-Order Bi-Abduction , 2014, CAV.

[24]  Joël Ouaknine,et al.  SeLoger: A Tool for Graph-Based Reasoning in Separation Logic , 2013, CAV.

[25]  Andrew D. Gordon,et al.  Bimodal Modelling of Source Code and Natural Language , 2015, ICML.

[26]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 2002, TOPL.

[27]  Peter Lee,et al.  Automatic numeric abstractions for heap-manipulating programs , 2010, POPL '10.

[28]  Jun Sun,et al.  Satisfiability Modulo Heap-Based Programs , 2016, CAV.

[29]  John C. Reynolds,et al.  Separation logic: a logic for shared mutable data structures , 2002, Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.

[30]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[31]  Dimitrios Vytiniotis,et al.  Under Consideration for Publication in J. Functional Programming Every Bit Counts: the Binary Representation of Typed Data and Programs , 2022 .

[32]  Soonho Kong,et al.  Automatically inferring loop invariants via algorithmic learning , 2015, Math. Struct. Comput. Sci..

[33]  Christof Löding,et al.  ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[34]  James Brotherston,et al.  Cyclic Abduction of Inductively Defined Safety and Termination Preconditions , 2014, SAS.

[35]  Parosh Aziz Abdulla,et al.  Verification of heap manipulating programs with ordered data by extended forest automata , 2015, Acta Informatica.

[36]  Suresh Jagannathan,et al.  Automatically learning shape specifications , 2016, PLDI.

[37]  Peter W. O'Hearn,et al.  Shape Analysis for Composite Data Structures , 2007, CAV.

[38]  James Brotherston,et al.  A Generic Cyclic Theorem Prover , 2012, APLAS.

[39]  Peter W. O'Hearn,et al.  Local Reasoning about Programs that Alter Data Structures , 2001, CSL.

[40]  Neil Immerman,et al.  Effectively-Propositional Reasoning about Reachability in Linked Data Structures , 2013, CAV.

[41]  Frank Piessens,et al.  Learning Assertions to Verify Linked-List Programs , 2015, SEFM.

[42]  Gökhan BakIr,et al.  Predicting Structured Data , 2008 .

[43]  Tomás Vojnar,et al.  Predator: A Practical Tool for Checking Manipulation of Dynamic Data Structures Using Separation Logic , 2011, CAV.

[44]  Ruzica Piskac,et al.  GRASShopper - Complete Heap Verification with Mixed Specifications , 2014, TACAS.

[45]  Alfredo Pironti,et al.  Implementing TLS with Verified Cryptographic Security , 2013, 2013 IEEE Symposium on Security and Privacy.

[46]  Kenneth L. McMillan,et al.  Lazy Abstraction with Interpolants , 2006, CAV.