Loop-extended symbolic execution on binary programs

Mixed concrete and symbolic execution is an important technique for finding and understanding software bugs, including security-relevant ones. However, existing symbolic execution techniques are limited to examining one execution path at a time, in which symbolic variables reflect only direct data dependencies. We introduce loop-extended symbolic execution, a generalization that broadens the coverage of symbolic results in programs with loops. It introduces symbolic variables for the number of times each loop executes, and links these with features of a known input grammar such as variable-length or repeating fields. This allows the symbolic constraints to cover a class of paths that includes different numbers of loop iterations, expressing loop-dependent program values in terms of properties of the input. By performing more reasoning symbolically, instead of by undirected exploration, applications of loop-extended symbolic execution can achieve better results and/or require fewer program executions. To demonstrate our technique, we apply it to the problem of discovering and diagnosing buffer-overflow vulnerabilities in software given only in binary form. Our tool finds vulnerabilities in both a standard benchmark suite and 3 real-world applications, after generating only a handful of candidate inputs, and also diagnoses general vulnerability conditions.

[1]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[2]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Dawson R. Engler,et al.  ARCHER: using symbolic, path-sensitive analysis to detect memory access errors , 2003, ESEC/FSE-11.

[5]  Sumit Gulwani,et al.  Discovering affine equalities using random interpretation , 2003, POPL '03.

[6]  Sumit Gulwani,et al.  Discovering affine equalities using random interpretation , 2003, POPL '03.

[7]  Somesh Jha,et al.  Buffer overrun detection using linear programming and static analysis , 2003, CCS '03.

[8]  Stefan Savage,et al.  Inside the Slammer Worm , 2003, IEEE Secur. Priv..

[9]  Michael Rodeh,et al.  CSSV: towards a realistic tool for statically detecting all buffer overflows in C , 2003, PLDI '03.

[10]  Richard Lippmann,et al.  Testing static analysis tools using exploitable buffer overflows from open source code , 2004, SIGSOFT '04/FSE-12.

[11]  Michael Karr,et al.  Affine relationships among variables of a program , 1976, Acta Informatica.

[12]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[13]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[14]  Dawson R. Engler,et al.  Execution Generated Test Cases: How to Make Systems Code Crash Itself , 2005, SPIN.

[15]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[16]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[17]  Jun Xu,et al.  Non-Control-Data Attacks Are Realistic Threats , 2005, USENIX Security Symposium.

[18]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[19]  Frederic T. Chong,et al.  Minos: Architectural support for protecting control data , 2006, TACO.

[20]  Hao Wang,et al.  Towards automatic generation of vulnerability-based signatures , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[21]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[22]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[23]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[24]  Directed test generation using symbolic grammars , 2007, ESEC-FSE '07.

[25]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[26]  Helen J. Wang,et al.  ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[27]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[28]  Rupak Majumdar,et al.  Directed test generation using symbolic grammars , 2007, ESEC-FSE companion '07.

[29]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[30]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[31]  Hao Wang,et al.  Creating Vulnerability Signatures Using Weakest Preconditions , 2007, 20th IEEE Computer Security Foundations Symposium (CSF'07).

[32]  Helmut Seidl,et al.  Analysis of modular arithmetic , 2005, TOPL.

[33]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[34]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[35]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[36]  David Brumley,et al.  Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[37]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[38]  R. Sekar,et al.  Efficient fine-grained binary instrumentationwith applications to taint-tracking , 2008, CGO '08.

[39]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[40]  Manuel Costa,et al.  Bouncer: securing software by blocking bad input , 2008, WRAITS '08.

[41]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[42]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[43]  Manuel Fähndrich,et al.  Pentagons: a weakly relational abstract domain for the efficient validation of array accesses , 2008, SAC '08.

[44]  Rupak Majumdar,et al.  Testing for buffer overflows with length abstraction , 2008, ISSTA '08.

[45]  Nikolaj Bjørner,et al.  Path Feasibility Analysis for String-Manipulating Programs , 2009, TACAS.

[46]  Zhenkai Liang,et al.  Towards Generating High Coverage Vulnerability-Based Signatures with Protocol-Level Constraint-Guided Exploration , 2009, RAID.