State Joining and Splitting for the Symbolic Execution of Binaries

Symbolic execution can be used to explore the possible run-time states of a program. It makes use of a concept of "state" where a variable's value has been replaced by an expression that gives the value as a function of program input. Additionally, a state can be equipped with a summary of control-flow history: a "path constraint" keeps track of the class of inputs that would have caused the same flow of control. But even simple programs can have trillions of paths, so a path-by-path analysis is impractical. We investigate a "state joining" approach to making symbolic execution more practical and describe the challenges of applying state joining to the analysis of unmodified Linux x86 executables. The results so far are mixed, with good results for some code. On other examples, state joining produces cumbersome constraints that are more expensive to solve than those generated by normal symbolic execution.

[1]  Helmut Veith,et al.  An Abstract Interpretation-Based Framework for Control Flow Reconstruction from Binaries , 2008, VMCAI.

[2]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[3]  Thomas W. Reps,et al.  WYSINWYX: What You See Is Not What You eXecute , 2005, VSTTE.

[4]  Carl Pixley,et al.  Constructing Efficient Formal Models from High-Level Descriptions Using Symbolic Simulation , 2005, International Journal of Parallel Programming.

[5]  Shin-ichi Minato Generation of BDDs from hardware algorithm descriptions , 1996, ICCAD 1996.

[6]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[7]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[8]  Rajeev Alur,et al.  A Temporal Logic of Nested Calls and Returns , 2004, TACAS.

[9]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[10]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[11]  Jason R. C. Patterson,et al.  Accurate static branch prediction by value range propagation , 1995, PLDI '95.

[12]  Tzi-cker Chiueh,et al.  BIRD: binary interpretation using runtime disassembly , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[13]  Kousha Etessami,et al.  Analysis of Recursive Game Graphs Using Data Flow Equations , 2004, VMCAI.

[14]  Alan J. Hu,et al.  Calysto: scalable and precise extended static checking , 2008, ICSE.

[15]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[16]  R. Rudell,et al.  Multiple-Valued Logic Minimization for PLA Synthesis , 1986 .

[17]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.