An open problem in program verification is to verify properties of computer programs that manipulate memory on the heap. A key challenge is to find formal descriptions of the data structures that are instantiated, which are used as input to a proof procedure that verifies the program. Standard approaches to the problem are to severely restrict the space of data structures that can be recognized (e.g., just lists), or to use search guided by handspecified heuristics. In this work, we explore a machine learning-based approach, where we execute the program and then learn to map the state of heap memory (represented as a labelled directed graph) to a logical description of the instantiated data structures. We formulate the learning problem as one of mapping from graphs to an element of a grammar over data structure descriptions. We report preliminary empirical results showing this to be a promising new direction. 1. Problem Description Consider the following program fragment: int maxLen(ToL* t) { if (t == NULL) return 0; int len = length(t->value); int rLen = maxLen(t->right); int lLen = maxLen(t->left); int cLen = rLen > lLen ? rLen : lLen; return (len > cLen ? len : cLen); } where length computes the length of a list. Suppose that we would like to prove that all pointer dereferences in this program are valid and will not cause the program to crash. How can we characterize the program’s heap at the beginning of maxLen so as to guarantee this property? One answer is “a binary tree of lists”, i.e., a nested data structure whose top level is a binary tree, and each tree node points to a (linked) list. In separation logic, a logic used to describe Proceedings of the Constructive Machine Learning workshop @ ICML 2015. Copyright 2015 by the author(s). heap structures, this can be expressed as follows: listtree(x) ≡ x = NULL ∨ ∃v, l, r.list(v) ∗ listtree(l) ∗ listtree(r) ∗ x 7→ {val : v, left : l, right : r} list(x) ≡ x = NULL ∨ ∃v, n.list(n) ∗ x 7→ {val : v, next : n} Here, x 7→ {val : v,next : n} means that x points to a memory region that contains a structure with val and next fields whose values are in turn v resp. n. The ∗ connective is a conjunction as ∧ in Boolean logic, but additionally requires that its operators refer to “separate” parts of the heap. Thus, listtree(x) implies that x is either NULL, or that it points to three values v, l, r on the heap, where the “value” v has to satisfy the list predicate and l and r are in turn again described by listtree. We can then prove that under the assumption listtree(x), no program run will fail due to dereferencing an unallocated memory address (this property is called memory safety) using a Hoare-style verification scheme (Hoare, 1969), The hardest part of this process is coming up with the description of data structures, and this is where we propose to use machine learning. Given a candidate description, tools from static program verification (Piskac et al., 2014) can determine whether the description is accurate and whether the description allows one to prove that the program satisfies the desired properties (e.g., memory safety). Thus, from the machine learning perspective, we can focus on generating a small number of candidate descriptions, and if any one is correct, then we have succeeded on the example. Given a new program at test time, we will run it a small number of times, extract the state of memory at relevant program locations (e.g., at the beginning of method calls), and then predict a separation logic formula. In reality we will map from several memory states from the same program location to a separation logic formula, but in this paper for simplicity we treat the problem as mapping from a single memory state to a separation logic formula. This paper describes our initial approach to this problem. Learning to Decipher the Heap for Program Verification
[1]
John C. Reynolds,et al.
Separation logic: a logic for shared mutable data structures
,
2002,
Proceedings 17th Annual IEEE Symposium on Logic in Computer Science.
[2]
Lauretta O. Osho,et al.
Axiomatic Basis for Computer Programming
,
2013
.
[3]
Ruzica Piskac,et al.
GRASShopper - Complete Heap Verification with Mixed Specifications
,
2014,
TACAS.
[4]
Nikolai Tillmann,et al.
DySy: dynamic symbolic execution for invariant inference
,
2008,
ICSE.
[5]
Corina S. Pasareanu,et al.
Abstract pathfinder
,
2012,
SOEN.
[6]
Alexander Aiken,et al.
A Data Driven Approach for Algebraic Loop Invariants
,
2013,
ESOP.
[7]
Peter W. O'Hearn,et al.
Shape Analysis for Composite Data Structures
,
2007,
CAV.
[8]
Peter W. O'Hearn,et al.
Local Reasoning about Programs that Alter Data Structures
,
2001,
CSL.
[9]
Christof Löding,et al.
ICE: A Robust Framework for Learning Invariants
,
2014,
CAV.
[10]
Alexander Aiken,et al.
Interpolants as Classifiers
,
2012,
CAV.
[11]
Stephen McCamant,et al.
The Daikon system for dynamic detection of likely invariants
,
2007,
Sci. Comput. Program..
[12]
Thomas Wies,et al.
Learning Invariants using Decision Trees
,
2015,
ArXiv.