A compiler-level intermediate representation based binary analysis and rewriting system

This paper presents component techniques essential for converting executables to a high-level intermediate representation (IR) of an existing compiler. The compiler IR is then employed for three distinct applications: binary rewriting using the compiler's binary back-end, vulnerability detection using source-level symbolic execution, and source-code recovery using the compiler's C backend. Our techniques enable complex high-level transformations not possible in existing binary systems, address a major challenge of input-derived memory addresses in symbolic execution and are the first to enable recovery of a fully functional source-code. We present techniques to segment the flat address space in an executable containing undifferentiated blocks of memory. We demonstrate the inadequacy of existing variable identification methods for their promotion to symbols and present our methods for symbol promotion. We also present methods to convert the physically addressed stack in an executable (with a stack pointer) to an abstract stack (without a stack pointer). Our methods do not use symbolic, relocation, or debug information since these are usually absent in deployed executables. We have integrated our techniques with a prototype x86 binary framework called SecondWrite that uses LLVM as IR. The robustness of the framework is demonstrated by handling executables totaling more than a million lines of source-code, produced by two different compilers (gcc and Microsoft Visual Studio compiler), three languages (C, C++, and Fortran), two operating systems (Windows and Linux) and a real world program (Apache server).

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[3]  R. Barua,et al.  Binary Rewriting without Relocation Information , 2010 .

[4]  Jianjun Li,et al.  Dynamic register promotion of stack variables , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[5]  Harish Patil,et al.  Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture , 2004, CGO.

[6]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .

[7]  George Candea,et al.  Reverse engineering of binary device drivers with RevNIC , 2010, EuroSys '10.

[8]  Gadi Haber,et al.  Optimization opportunities created by global data reordering , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[9]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[10]  Erik R. Altman,et al.  Workshop on binary translation - 2001 , 2001, CARN.

[11]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[12]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[13]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[14]  Harish Patil,et al.  Ispike: a post-link optimizer for the Intel/spl reg/ Itanium/spl reg/ architecture , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[15]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[16]  Sencun Zhu,et al.  STILL: Exploit Code Detection via Static Taint and Initialization Analyses , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[17]  William R. Bush,et al.  A static analyzer for finding dynamic programming errors , 2000, Softw. Pract. Exp..

[18]  Rudolf Eigenmann,et al.  Compiler Infrastructure , 2013, International Journal of Parallel Programming.

[19]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[20]  K. De Bosschere,et al.  DIABLO: a reliable, retargetable and extensible link-time rewriting framework , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[21]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[22]  Gurindar S. Sohi,et al.  Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[23]  Thomas W. Reps,et al.  Analysis of Executables: Benefits and Challenges (Dagstuhl Seminar 12051) , 2012, Dagstuhl Reports.

[24]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[25]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[26]  Angelos D. Keromytis,et al.  Retrofitting Security in COTS Software with Binary Rewriting , 2011, SEC.

[27]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[28]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[29]  Jianmin Pang,et al.  Parameter and Return-value Analysis of Binary Executables , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[30]  Rajeev Barua,et al.  Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[31]  David A. Wagner,et al.  A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities , 2000, NDSS.

[32]  Mike Van,et al.  UQBT: Adaptable Binary Translation at Low Cost , 2000 .

[33]  George Candea,et al.  Enabling Sophisticated Analysis of x86 Binaries with RevGen , 2011, HotDep 2011.

[34]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[35]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[36]  Roy Dz-Ching Ju,et al.  A new algorithm for scalar register promotion based on SSA form , 1998, PLDI '98.

[37]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[38]  Rajeev Barua,et al.  Scalable variable and data type detection in a binary rewriter , 2013, PLDI.

[39]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.