There's Plenty of Room at the Bottom: Analyzing and Verifying Machine Code

This paper discusses the obstacles that stand in the way of doing a good job of machine-code analysis Compared with analysis of source code, the challenge is to drop all assumptions about having certain kinds of information available (variables, control-flow graph, call-graph, etc.) and also to address new kinds of behaviors (arithmetic on addresses, jumps to “hidden” instructions starting at positions that are out of registration with the instruction boundaries of a given reading of an instruction stream, self-modifying code, etc.). The paper describes some of the challenges that arise when analyzing machine code, and what can be done about them It also provides a rationale for some of the design decisions made in the machine-code-analysis tools that we have built over the past few years.

[1]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[2]  Thomas W. Reps,et al.  A System for Generating Static Analyzers for Machine Instructions , 2008, CC.

[3]  Thomas A. Henzinger,et al.  Path invariants , 2007, PLDI '07.

[4]  Sriram K. Rajamani,et al.  Compositional may-must program analysis: unleashing the power of alternation , 2010, POPL '10.

[5]  Helmut Seidl,et al.  Analysis of Modular Arithmetic , 2005, ESOP.

[6]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[7]  Thomas W. Reps,et al.  Extended Weighted Pushdown Systems , 2005, CAV.

[8]  R. Alur,et al.  Adding nesting structure to words , 2006, JACM.

[9]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[10]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[11]  Thomas W. Reps,et al.  Analyzing Stripped Device-Driver Executables , 2008, TACAS.

[12]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[13]  Thomas W. Reps,et al.  A Next-Generation Platform for Analyzing Executables , 2005, APLAS.

[14]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[15]  Thomas W. Reps,et al.  Symbolic Analysis via Semantic Reinterpretation , 2009, SPIN.

[16]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[17]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[18]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[19]  Robert J. Simmons,et al.  Proofs from Tests , 2008, IEEE Transactions on Software Engineering.

[20]  Jochen Hoenicke,et al.  Nested interpolants , 2010, POPL '10.

[21]  Thomas W. Reps,et al.  Recency-Abstraction for Heap-Allocated Storage , 2006, SAS.

[22]  Thomas W. Reps,et al.  Intermediate-representation recovery from low-level code , 2006, PEPM '06.

[23]  Chi-Hua Chen,et al.  Model Checking x86 Executables with CodeSurfer/x86 and WPDS++ , 2005, CAV.

[24]  Thomas W. Reps,et al.  Directed Proof Generation for Machine Code , 2010, CAV.

[25]  Thomas A. Henzinger,et al.  SYNERGY: a new algorithm for property checking , 2006, SIGSOFT '06/FSE-14.

[26]  Andreas Podelski,et al.  ACSAR: Software Model Checking with Transfinite Refinement , 2007, SPIN.