Scalable variable and data type detection in a binary rewriter

We present scalable static analyses to recover variables, data types, and function prototypes from stripped x86 executables (without symbol or debug information) and obtain a functional intermediate representation (IR) for analysis and rewriting purposes. Our techniques on average run 352X faster than current techniques and still have the same precision. This enables analyzing executables as large as millions of instructions in minutes which is not possible using existing techniques. Our techniques can recover variables allocated to the floating point stack unlike current techniques. We have integrated our techniques to obtain a compiler level IR that works correctly if recompiled and produces the same output as the input executable. We demonstrate scalability, precision and correctness of our proposed techniques by evaluating them on the complete SPEC2006 benchmarks suite.

[1]  Mike Van Emmerik,et al.  Using a decompiler for real-world source recovery , 2004, 11th Working Conference on Reverse Engineering.

[2]  Doug Simon,et al.  Procedure abstraction recovery from binary code , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[3]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[4]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[5]  Tao Wei,et al.  IntScope: Automatically Detecting Integer Overflow Vulnerability in X86 Binary Using Symbolic Execution , 2009, NDSS.

[6]  Saumya K. Debray,et al.  Alias analysis of executable code , 1998, POPL '98.

[7]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[8]  Alan Mycroft,et al.  Type-Based Decompilation (or Program Reconstruction via Type Reconstruction) , 1999, ESOP.

[9]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[10]  Cristina Cifuentes,et al.  Intraprocedural static slicing of binary executables , 1997, 1997 Proceedings International Conference on Software Maintenance.

[11]  Mike Van,et al.  UQBT: Adaptable Binary Translation at Low Cost , 2000 .

[12]  K. De Bosschere,et al.  DIABLO: a reliable, retargetable and extensible link-time rewriting framework , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[13]  Easwaran Raman,et al.  Practical and accurate low-level pointer analysis , 2005, International Symposium on Code Generation and Optimization.

[14]  Katerina Troshina,et al.  Reconstruction of Composite Types for Decompilation , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[15]  Jianmin Pang,et al.  Parameter and Return-value Analysis of Binary Executables , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[16]  Rajeev Barua,et al.  Decompilation to Compiler High IR in a binary rewriter Kapil , 2010 .

[17]  R. Barua,et al.  Binary Rewriting without Relocation Information , 2010 .

[18]  Rajeev Barua,et al.  A compiler-level intermediate representation based binary analysis and rewriting system , 2013, EuroSys '13.

[19]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[20]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[22]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[23]  A. V. Chernov,et al.  Automatic reconstruction of data types in the decompilation problem , 2009, Programming and Computer Software.

[24]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[25]  Tzi-cker Chiueh,et al.  BIRD: binary interpretation using runtime disassembly , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[26]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[27]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .