TIE: Principled Reverse Engineering of Types in Binary Programs

A recurring problem in security is reverse engineering binary code to recover high-level language data abstractions and types. High-level programming languages have data abstractions such as buffers, structures, and local variables that all help programmers and program analyses reason about programs in a scalable manner. During compilation, these abstractions are removed as code is translated down to operations on registers and one globally addressed memory region. Reverse engineering consists of “undoing” the compilation to recover high-level information so that programmers, security professionals, and analyses can all more easily reason about the binary code. In this paper we develop novel techniques for reverse engineering data type abstractions from binary programs. At the heart of our approach is a novel type reconstruction system based upon binary code analysis. Our techniques and system can be applied as part of both static or dynamic analysis, thus are extensible to a large number of security settings. Our results on 87 programs show that TIE is both more accurate and more precise at recovering high-level types than existing mechanisms.

[1]  Andrew W. Appel,et al.  Modern Compiler Implementation in ML , 1997 .

[2]  David Walker,et al.  Stack-based typed assembly language , 1998, Journal of Functional Programming.

[3]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[4]  Alan Mycroft,et al.  Type-Based Decompilation (or Program Reconstruction via Type Reconstruction) , 1999, ESOP.

[5]  MorrisettGreg,et al.  From system F to typed assembly language , 1999 .

[6]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[7]  Karl Crary,et al.  Toward a foundational typed assembly language , 2003, POPL '03.

[8]  Benjamin C. Pierce,et al.  Advanced Topics In Types And Programming Languages , 2004 .

[9]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[10]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[13]  A. V. Chernov,et al.  Automatic reconstruction of data types in the decompilation problem , 2009, Programming and Computer Software.

[14]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[15]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.