Symbolic Grey-Box Learning of Input-Output Relations

Learning of stateful models has been extensively used in verification. Some applications include inference of interface invariants, learning-guided concolic execution, compositional verification, and regular model checking. Learning shows a great promise for verification, but suffers from two fundamental limitati ons. First, learning stateful models over concrete alphabets does not scale in practice, as alphabets can be large or even infinite in size. S econd, learning techniques produce conjectures, which might be neither over- nor under-approximations, but rather some mix of the two. The new technique we propose — Sigma ∗ — overcomes these problems by combining black- and white-box analysis techniques: learning and abstraction. Such grey-box setting al lows inspection of the internal symbolic state of the program, allowing us to learn symbolic transducers with input and output alphabets ranging over finite sets of symbolic terms. The technique alt ernates between symbolic conjectures and sound over-approximations of the program. As such, the technique presents a novel twist to the more standard alternation among under- and over-approximations often used in verification. Sigma ∗ is parameterized by an abstraction function and a class of symbolic transducers. In this pa per, we develop Sigma ∗ parameterized by a variant of predicate abstraction, and k-lookback symbolic transducers — a new variant of symbolic transducers, for which we present learning and separation sequence computation algorithms. Verification of such transducers i s, for instance, important for security of web applications and might find its applications in other areas of verification. The main tec hnical result we present is that Sigma ∗ is complete relative to abstraction function.

[1]  Oscar H. Ibarra,et al.  The unsolvability of the equivalence problem for e-free NGSM's with unary input (output) alphabet and applications , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[2]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[3]  A. Rosser A.I.D.S. , 1986, Maryland medical journal.

[4]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Juan Miguel Vilar Query learning of subsequential transducers , 1996, ICGI.

[6]  David Lee,et al.  Principles and methods of testing finite state machines-a survey , 1996, Proc. IEEE.

[7]  Hassen Saïdi,et al.  Construction of Abstract State Graphs with PVS , 1997, CAV.

[8]  Gertjan van Noord,et al.  Finite State Transducers with Predicates and Identities , 2001, Grammars.

[9]  Andreas Podelski,et al.  Boolean and Cartesian abstraction for model checking C programs , 2001, International Journal on Software Tools for Technology Transfer.

[10]  Thomas A. Henzinger,et al.  Lazy abstraction , 2002, POPL '02.

[11]  Andreas Podelski,et al.  Relative Completeness of Abstraction Refinement for Software Model Checking , 2002, TACAS.

[12]  Cormac Flanagan,et al.  Predicate abstraction for software verification , 2002, POPL '02.

[13]  Tomás Vojnar,et al.  Regular Model Checking Using Inference of Regular Languages , 2004, INFINITY.

[14]  Alan J. Demers,et al.  On some decidable properties of finite state translations , 2004, Acta Informatica.

[15]  Daniel Kroening,et al.  SATABS: SAT-Based Predicate Abstraction for ANSI-C , 2005, TACAS.

[16]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[17]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[18]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[19]  John E. Hopcroft On the equivalence and containment problems for context-free languages , 2005, Mathematical systems theory.

[20]  Pavol Cerný,et al.  Synthesis of interface specifications for Java classes , 2005, POPL '05.

[21]  Thomas A. Henzinger,et al.  SYNERGY: a new algorithm for property checking , 2006, SIGSOFT '06/FSE-14.

[22]  Kenneth L. McMillan,et al.  Lazy Abstraction with Interpolants , 2006, CAV.

[23]  Edith Elkind,et al.  Grey-Box Checking , 2006, FORTE.

[24]  Ranjit Jhala,et al.  A Practical and Complete Approach to Predicate Refinement , 2006, TACAS.

[25]  Frank Tip,et al.  Finding bugs in dynamic web applications , 2008, ISSTA '08.

[26]  Bengt Jonsson,et al.  Regular Inference for State Machines Using Domains with Equality Tests , 2008, FASE.

[27]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[28]  Sumit Gulwani,et al.  Program verification using templates over predicate abstraction , 2009, PLDI '09.

[29]  Roland Groz,et al.  Inferring Mealy Machines , 2009, FM.

[30]  Sriram K. Rajamani,et al.  Compositional may-must program analysis: unleashing the power of alternation , 2010, POPL '10.

[31]  Fides Aarts,et al.  Generating Models of Infinite-State Communication Protocols Using Regular Inference with Abstraction , 2010, ICTSS.

[32]  Robert J. Simmons,et al.  Proofs from Tests , 2008, IEEE Transactions on Software Engineering.

[33]  N. Bjørner,et al.  Symbolic Transducers , 2011 .

[34]  Dawn Xiaodong Song,et al.  MACE: Model-inference-Assisted Concolic Exploration for Protocol and Vulnerability Discovery , 2011, USENIX Security Symposium.

[35]  Benjamin Livshits,et al.  Fast and Precise Sanitizer Analysis with BEK , 2011, USENIX Security Symposium.

[36]  Pavol Cerný,et al.  Streaming transducers for algorithmic verification of single-pass list-processing programs , 2010, POPL '11.

[37]  Todd Millstein,et al.  Automatic predicate abstraction of C programs , 2001, PLDI '01.

[38]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[39]  Symbolic finite state transducers: algorithms and applications , 2012, POPL.

[40]  Bengt Jonsson,et al.  Inferring Canonical Register Automata , 2012, VMCAI.

[41]  É. André,et al.  Learning assumptions for compositional verification of timed systems , 2013 .