Neuro-Symbolic Execution: The Feasibility of an Inductive Approach to Symbolic Execution

Symbolic execution is a powerful technique for program analysis. However, it has many limitations in practical applicability: the path explosion problem encumbers scalability, the need for language-specific implementation, the inability to handle complex dependencies, and the limited expressiveness of theories supported by underlying satisfiability checkers. Often, relationships between variables of interest are not expressible directly as purely symbolic constraints. To this end, we present a new approach -- neuro-symbolic execution -- which learns an approximation of the relationship as a neural net. It features a constraint solver that can solve mixed constraints, involving both symbolic expressions and neural network representation. To do so, we envision such constraint solving as procedure combining SMT solving and gradient-based optimization. We demonstrate the utility of neuro-symbolic execution in constructing exploits for buffer overflows. We report success on 13/14 programs which have difficult constraints, known to require specialized extensions to symbolic execution. In addition, our technique solves $100$\% of the given neuro-symbolic constraints in $73$ programs from standard verification and invariant synthesis benchmarks.

[1]  Deepak Kapur,et al.  DIG: A Dynamic Invariant Generator for Polynomial and Array Invariants , 2014, TSEM.

[2]  Yang Liu,et al.  Proteus: computing disjunctive loop summary via path dependency analysis , 2016, SIGSOFT FSE.

[3]  Ranjit Jhala,et al.  A Practical and Complete Approach to Predicate Refinement , 2006, TACAS.

[4]  Joao Marques-Silva,et al.  GRASP-A new search algorithm for satisfiability , 1996, Proceedings of International Conference on Computer Aided Design.

[5]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[6]  Yang Yu,et al.  Derivative-Free Optimization via Classification , 2016, AAAI.

[7]  Leonid Ryzhyk,et al.  Verifying Properties of Binarized Deep Neural Networks , 2017, AAAI.

[8]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[9]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[10]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[11]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.

[12]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[13]  David Brumley,et al.  Enhancing symbolic execution with veritesting , 2014, ICSE.

[14]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[15]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[16]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[17]  Erika Ábrahám,et al.  Building Bridges between Symbolic Computation and Satisfiability Checking , 2015, ISSAC.

[18]  Enric Rodríguez-Carbonell,et al.  Generating all polynomial invariants in simple loops , 2007, J. Symb. Comput..

[19]  Matthew B. Dwyer,et al.  CIVL: the concurrency intermediate verification language , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[21]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[22]  George Candea,et al.  Efficient state merging in symbolic execution , 2012, Software Engineering.

[23]  Alessandro Orso,et al.  Type-Dependence Analysis and Program Transformation for Symbolic Execution , 2007, TACAS.

[24]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[25]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[26]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[27]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[28]  Andrew Ruef,et al.  Counterexample-guided approach to finding numerical invariants , 2017, ESEC/SIGSOFT FSE.

[29]  Stefan C. Kremer,et al.  Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[30]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[31]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[32]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Michael D. Ernst,et al.  HAMPI: A String Solver for Testing, Analysis and Vulnerability Detection , 2011, CAV.

[36]  Pushmeet Kohli,et al.  Neuro-Symbolic Program Corrector for Introductory Programming Assignments , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[37]  Koushik Sen,et al.  MultiSE: multi-path symbolic execution using value summaries , 2015, ESEC/SIGSOFT FSE.

[38]  Dawei Qi,et al.  Path exploration based on symbolic output , 2013, TSEM.

[39]  Stephen McCamant,et al.  Loop-extended symbolic execution on binary programs , 2009, ISSTA.

[40]  Darko Marinov,et al.  On test repair using symbolic execution , 2010, ISSTA '10.

[41]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[42]  Aaron R. Bradley,et al.  SAT-Based Model Checking without Unrolling , 2011, VMCAI.

[43]  Jorge A. Navas,et al.  TRACER: A Symbolic Execution Tool for Verification , 2012, CAV.

[44]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[45]  A. Miné Weakly Relational Numerical Abstract Domains , 2004 .

[46]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[47]  Patrick Cousot,et al.  The ASTREÉ Analyzer , 2005, ESOP.

[48]  Roger B. Dannenberg,et al.  Formal Program Verification Using Symbolic Execution , 1982, IEEE Transactions on Software Engineering.

[49]  D. Sorensen Newton's method with a model trust region modification , 1982 .

[50]  Thomas A. Henzinger,et al.  Path invariants , 2007, PLDI '07.

[51]  Thomas Wies,et al.  Learning Invariants using Decision Trees , 2015, ArXiv.

[52]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[53]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[54]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[55]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[56]  Roberto Baldoni,et al.  A Survey of Symbolic Execution Techniques , 2016, ACM Comput. Surv..

[57]  Alan Bundy,et al.  Breadth-First Search , 1984 .

[58]  Xin Li,et al.  Symbolic execution of complex program driven by machine learning based constraint solving , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[59]  Stephen Grossberg,et al.  Recurrent neural networks , 2013, Scholarpedia.

[60]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[61]  Richard Lippmann,et al.  Testing static analysis tools using exploitable buffer overflows from open source code , 2004, SIGSOFT '04/FSE-12.

[62]  Marco Canini,et al.  A NICE Way to Test OpenFlow Applications , 2012, NSDI.

[63]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[64]  Swarat Chaudhuri,et al.  Dynamic inference of likely data preconditions over predicates by tree learning , 2008, ISSTA '08.

[65]  Ashutosh Gupta,et al.  InvGen: An Efficient Invariant Generator , 2009, CAV.

[66]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[67]  Jeffrey S. Foster,et al.  SymDroid: Symbolic Execution for Dalvik Bytecode , 2012 .

[68]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[69]  Michael S. Gashler,et al.  A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[70]  Todd D. Millstein,et al.  Data-Driven Loop Invariant Inference with Automatic Feature Synthesis , 2017, ArXiv.

[71]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[72]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[73]  Christof Löding,et al.  ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[74]  Henny B. Sipma,et al.  Linear Invariant Generation Using Non-linear Constraint Solving , 2003, CAV.

[75]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[76]  Enric Rodríguez-Carbonell,et al.  Automatic generation of polynomial invariants of bounded degree using abstract interpretation , 2007, Sci. Comput. Program..

[77]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[78]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[79]  Nikolai Tillmann,et al.  Fitness-guided path exploration in dynamic symbolic execution , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[80]  Kenneth L. McMillan,et al.  Interpolation and SAT-Based Model Checking , 2003, CAV.

[81]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[82]  Todd Millstein,et al.  LoopInvGen: A Loop Invariant Generator based on Precondition Inference , 2017, 1707.02029.

[83]  Carlo Ghezzi,et al.  Using symbolic execution for verifying safety-critical systems , 2001, ESEC/FSE-9.

[84]  Guodong Li,et al.  SymJS: automatic symbolic testing of JavaScript web applications , 2014, SIGSOFT FSE.

[85]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[86]  Krzysztof Czarnecki,et al.  Exponential Recency Weighted Average Branching Heuristic for SAT Solvers , 2016, AAAI.

[87]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[88]  Nikolai Tillmann,et al.  Demand-Driven Compositional Symbolic Execution , 2008, TACAS.

[89]  Rupak Majumdar,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 1997, Lecture Notes in Computer Science.

[90]  Lijun Zhang,et al.  Counterexample-Guided Polynomial Loop Invariant Generation by Lagrange Interpolation , 2015, CAV.

[91]  Dawson R. Engler,et al.  EXE: Automatically Generating Inputs of Death , 2008, TSEC.

[92]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.