Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints

Symbolic execution is a powerful technique for program analysis. However, it has many limitations in practical applicability: the path explosion problem encumbers scalability, the need for language-specific implementation, the inability to handle complex dependencies, and the limited expressiveness of theories supported by underlying satisfiability checkers. Often, relationships between variables of interest are not expressible directly as purely symbolic constraints. To this end, we present a new approach—neuro-symbolic execution—which learns an approximation of the relationship between program values of interest, as a neural network. We develop a procedure for checking satisfiability of mixed constraints, involving both symbolic expressions and neural representations. We implement our new approach in a tool called NEUEX as an extension of KLEE, a state-of-the-art dynamic symbolic execution engine. NEUEX finds 33 exploits in a benchmark of 7 programs within 12 hours. This is an improvement in the bug finding efficacy of 94% over vanilla KLEE. We show that this new approach drives execution down difficult paths on which KLEE and other DSE extensions get stuck, eliminating limitations of purely SMT-based techniques.

[1]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[2]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[3]  George Candea,et al.  Efficient state merging in symbolic execution , 2012, Software Engineering.

[4]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[5]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[6]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[7]  Junfeng Yang,et al.  NEUZZ: Efficient Fuzzing with Neural Program Smoothing , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[8]  Todd D. Millstein,et al.  Data-Driven Loop Invariant Inference with Automatic Feature Synthesis , 2017, ArXiv.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Ranjit Jhala,et al.  A Practical and Complete Approach to Predicate Refinement , 2006, TACAS.

[11]  Alexander Aiken,et al.  Synthesizing program input grammars , 2016, PLDI.

[12]  Ashutosh Gupta,et al.  InvGen: An Efficient Invariant Generator , 2009, CAV.

[13]  Christof Löding,et al.  ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[14]  Enric Rodríguez-Carbonell,et al.  Automatic generation of polynomial invariants of bounded degree using abstract interpretation , 2007, Sci. Comput. Program..

[15]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[16]  Yang Yu,et al.  Derivative-Free Optimization via Classification , 2016, AAAI.

[17]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[18]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[19]  Roberto Baldoni,et al.  A Survey of Symbolic Execution Techniques , 2016, ACM Comput. Surv..

[20]  Alan Bundy,et al.  Breadth-First Search , 1984 .

[21]  Stephen McCamant,et al.  Loop-extended symbolic execution on binary programs , 2009, ISSTA.

[22]  Edmund M. Clarke,et al.  dReal: An SMT Solver for Nonlinear Theories over the Reals , 2013, CADE.

[23]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.

[24]  Matthew B. Dwyer,et al.  CIVL: the concurrency intermediate verification language , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[25]  Patrick Cousot,et al.  The ASTREÉ Analyzer , 2005, ESOP.

[26]  Roger B. Dannenberg,et al.  Formal Program Verification Using Symbolic Execution , 1982, IEEE Transactions on Software Engineering.

[27]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[28]  Kenneth L. McMillan,et al.  Interpolation and SAT-Based Model Checking , 2003, CAV.

[29]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[30]  Carlo Ghezzi,et al.  Using symbolic execution for verifying safety-critical systems , 2001, ESEC/FSE-9.

[31]  Guodong Li,et al.  SymJS: automatic symbolic testing of JavaScript web applications , 2014, SIGSOFT FSE.

[32]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[33]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[34]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[35]  Andrew Ruef,et al.  Counterexample-guided approach to finding numerical invariants , 2017, ESEC/SIGSOFT FSE.

[36]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[37]  Michael D. Ernst,et al.  HAMPI: A String Solver for Testing, Analysis and Vulnerability Detection , 2011, CAV.

[38]  Koushik Sen,et al.  MultiSE: multi-path symbolic execution using value summaries , 2015, ESEC/SIGSOFT FSE.

[39]  Henny B. Sipma,et al.  Linear Invariant Generation Using Non-linear Constraint Solving , 2003, CAV.

[40]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[41]  David Brumley,et al.  Enhancing symbolic execution with veritesting , 2014, ICSE.

[42]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[43]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[44]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[45]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Philipp Rümmer,et al.  An SMT-LIB Theory of Binary Floating-Point Arithmetic ∗ , 2010 .

[48]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[49]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[50]  Aaron R. Bradley,et al.  SAT-Based Model Checking without Unrolling , 2011, VMCAI.

[51]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[52]  Krzysztof Czarnecki,et al.  Exponential Recency Weighted Average Branching Heuristic for SAT Solvers , 2016, AAAI.

[53]  Nikolai Tillmann,et al.  Demand-Driven Compositional Symbolic Execution , 2008, TACAS.

[54]  Lijun Zhang,et al.  Counterexample-Guided Polynomial Loop Invariant Generation by Lagrange Interpolation , 2015, CAV.

[55]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[56]  Thomas Wies,et al.  Learning Invariants using Decision Trees , 2015, ArXiv.

[57]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[58]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[59]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[60]  Michael S. Gashler,et al.  A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[61]  Xin Li,et al.  Symbolic execution of complex program driven by machine learning based constraint solving , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[62]  Richard Lippmann,et al.  Testing static analysis tools using exploitable buffer overflows from open source code , 2004, SIGSOFT '04/FSE-12.

[63]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[64]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[65]  Yang Liu,et al.  Proteus: computing disjunctive loop summary via path dependency analysis , 2016, SIGSOFT FSE.

[66]  Leonid Ryzhyk,et al.  Verifying Properties of Binarized Deep Neural Networks , 2017, AAAI.

[67]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).