Decompilation to Compiler High IR in a binary rewriter Kapil

A binary rewriter is a piece of software that accepts a binary executable program as input, and produces an improved executable as output. This paper describes the first technique in literature to decompile the input binary into an existing compiler’s high-level intermediate form (IR). The compiler’s back-end is then used to generate the output binary from the IR. Doing so enables the use of the rich set of compiler analysis and transformation passes available in mature compilers. It also enables binary rewriters to perform complex high-level transformations, such as automatic parallelization, not possible in existing binary rewriters. Certain characteristics of binary code pose a great challenge while translating a binary to a high-level compiler IR; these include the use of an explicitly addressed stack, lack of function prototypes and the lack of symbols. We present techniques to overcome these challenges. We have built a prototype binary rewriter called SecondWrite that uses LLVM, a widely-used compiler infrastructure, as our intermediate IR, and rewrites both x86 binaries. Our results show that SecondWrite accelerates un-optimized binaries by 27% on average for our benchmarks, and maintains the performance of already optimized binaries without any custom optimizations on our part. We also present two case studies for custom improvement – automatic parallelization and security – to exemplify the benefits and applications of a binary rewriter using a high IR.

[1]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[2]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[3]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[4]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[5]  Gurindar S. Sohi,et al.  Master/slave speculative parallelization , 2002, MICRO.

[6]  Tzi-cker Chiueh,et al.  BIRD: binary interpretation using runtime disassembly , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[7]  John Wilander,et al.  A Comparison of Publicly Available Tools for Dynamic Buffer Overflow Prevention , 2003, NDSS.

[8]  Rajeev Barua,et al.  Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Gopal Gupta,et al.  Static program analysis of embedded executable assembly code , 2004, CASES '04.

[10]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[11]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .

[12]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[13]  Saumya K. Debray,et al.  Alias analysis of executable code , 1998, POPL '98.

[14]  Mike Van,et al.  UQBT: Adaptable Binary Translation at Low Cost , 2000 .

[15]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[16]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[17]  K. De Bosschere,et al.  DIABLO: a reliable, retargetable and extensible link-time rewriting framework , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[18]  Thomas W. Reps,et al.  Intermediate-representation recovery from low-level code , 2006, PEPM '06.

[19]  Jianmin Pang,et al.  Parameter and Return-value Analysis of Binary Executables , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).