RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization

Analyzing the security of closed source binaries is currently impractical for end-users, or even developers who rely on third-party libraries. Such analysis relies on automatic vulnerability discovery techniques, most notably fuzzing with sanitizers enabled. The current state of the art for applying fuzzing or sanitization to binaries is dynamic binary translation, which has prohibitive performance overhead. The alternate technique, static binary rewriting, cannot fully recover symbolization information and hence has difficulty modifying binaries to track code coverage for fuzzing or to add security checks for sanitizers.The ideal solution for binary security analysis would be a static rewriter that can intelligently add the required instrumentation as if it were inserted at compile time. Such instrumentation requires an analysis to statically disambiguate between references and scalars, a problem known to be undecidable in the general case. We show that recovering this information is possible in practice for the most common class of software and libraries: 64-bit, position independent code. Based on this observation, we develop RetroWrite, a binary-rewriting instrumentation to support American Fuzzy Lop (AFL) and Address Sanitizer (ASan), and show that it can achieve compiler-level performance while retaining precision. Binaries rewritten for coverage-guided fuzzing using RetroWrite are identical in performance to compiler-instrumented binaries and outperform the default QEMU-based instrumentation by 4.5x while triggering more bugs. Our implementation of binary-only Address Sanitizer is 3x faster than Valgrind’s memcheck, the state-of-the-art binary-only memory checker, and detects 80% more bugs in our evaluation.

[1]  Barton P. Miller,et al.  Practical analysis of stripped binary code , 2005, CARN.

[2]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[3]  Christopher Krügel,et al.  Firmalice - Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware , 2015, NDSS.

[4]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[5]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[6]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Mingwei Zhang,et al.  Control Flow Integrity for COTS Binaries , 2013, USENIX Security Symposium.

[8]  Helmut Veith,et al.  Jakstab: A Static Analysis Platform for Binaries , 2008, CAV.

[9]  Meng Xu,et al.  QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing , 2018, USENIX Security Symposium.

[10]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[11]  Cristina Cifuentes,et al.  Recovery of jump table case statements from binary code , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[12]  Kevin W. Hamlen,et al.  Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics , 2018, NDSS.

[13]  Mathias Payer,et al.  T-Fuzz: Fuzzing by Program Transformation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[14]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[15]  Angelos Stavrou,et al.  Strict Virtual Call Integrity Checking for C++ Binaries , 2017, AsiaCCS.

[16]  Xiangyu Zhang,et al.  BISTRO: Binary Component Extraction and Embedding for Software Security Applications , 2013, ESORICS.

[17]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[18]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[19]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[20]  Dinghao Wu,et al.  Reassembleable Disassembling , 2015, USENIX Security Symposium.

[21]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[22]  Rajeev Barua,et al.  Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[23]  William K. Robertson,et al.  LAVA: Large-Scale Automated Vulnerability Addition , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[24]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[25]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[26]  Xi Chen,et al.  An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries , 2016, USENIX Security Symposium.

[27]  Gregory R. Andrews,et al.  Disassembly of executable code revisited , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[28]  Herbert Bos,et al.  MARX: Uncovering Class Hierarchies in C++ Programs , 2017, NDSS.

[29]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[30]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.

[31]  Gang-Ryung Uh,et al.  Analyzing Dynamic Binary Instrumentation Overhead , 2007 .

[32]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[33]  Thomas Dullien,et al.  REIL: A platform-independent intermediate representation of disassembled code for static code analysis , 2009 .

[34]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[35]  Axel Simon,et al.  Precise Static Analysis of Binaries by Extracting Relational Information , 2011, 2011 18th Working Conference on Reverse Engineering.

[36]  Thomas R. Gross,et al.  Fine-grained user-space security through virtualization , 2011, VEE '11.

[37]  Kevin W. Hamlen,et al.  Securing untrusted code via compiler-agnostic binary rewriting , 2012, ACSAC '12.

[38]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[39]  Christopher Krügel,et al.  Ramblr: Making Reassembly Great Again , 2017, NDSS.

[40]  R. Nigel Horspool,et al.  An Approach to the Problem of Detranslation of Computer Programs , 1980, Comput. J..

[41]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[42]  Alexey Loginov,et al.  Polymorphic type inference for machine code , 2016, PLDI.

[43]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[44]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[45]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2017, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[46]  Herbert Bos,et al.  Compiler-Agnostic Function Detection in Binaries , 2017, 2017 IEEE European Symposium on Security and Privacy (EuroS&P).

[47]  Giovanni Agosta,et al.  rev.ng: a unified binary analysis framework to recover CFGs and function boundaries , 2017, CC.

[48]  Mathias Payer,et al.  Control-Flow Integrity , 2017, ACM Comput. Surv..

[49]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[50]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[51]  Milo M. K. Martin,et al.  CETS: compiler enforced temporal safety for C , 2010, ISMM '10.

[52]  Milo M. K. Martin,et al.  SoftBound: highly compatible and complete spatial memory safety for c , 2009, PLDI '09.

[53]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[54]  Dominik Stoffel,et al.  Speculative disassembly of binary code , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[55]  Bhavani M. Thuraisingham,et al.  Differentiating Code from Data in x86 Binaries , 2011, ECML/PKDD.

[56]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[57]  Angelos D. Keromytis,et al.  Retrofitting Security in COTS Software with Binary Rewriting , 2011, SEC.

[58]  Mathias Payer,et al.  Control-Flow Integrity , 2017, ACM Comput. Surv..

[59]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[60]  Barton P. Miller,et al.  Anywhere, any-time binary instrumentation , 2011, PASTE '11.

[61]  Daniel C. DuVarney,et al.  Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits , 2003, USENIX Security Symposium.

[62]  Qin Zhao,et al.  Practical memory checking with Dr. Memory , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[63]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[64]  Johannes Kinder,et al.  Static Analysis of x86 Executables , 2010 .