Binary code is not easy

Binary code analysis is an enabling technique for many applications. Modern compilers and run-time libraries have introduced significant complexities to binary code, which negatively affect the capabilities of binary analysis tool kits to analyze binary code, and may cause tools to report inaccurate information about binary code. Analysts may hence be confused and applications based on these tool kits may have degrading quality. We examine the problem of constructing control flow graphs from binary code and labeling the graphs with accurate function boundary annotations. We identified several challenging code constructs that represent hard-to-analyze aspects of binary code, and show code examples for each code construct. As part of this discussion, we present new code parsing algorithms in our open source Dyninst tool kit that support these constructs, including a new model for describing jump tables that improves our ability to precisely determine the control flow targets, a new interprocedural analysis to determine when a function is non-returning, and techniques for handling tail calls. We evaluated how various tool kits fare when handling these code constructs with real software as well as test binaries patterned after each challenging code construct we found in real software.

[1]  Mingwei Zhang,et al.  Control Flow Integrity for COTS Binaries , 2013, USENIX Security Symposium.

[2]  Barton P. Miller,et al.  Hybrid analysis and control of malicious code , 2012 .

[3]  William D. Clinger Proper tail recursion and space efficiency , 1998, PLDI.

[4]  Jörg Brauer,et al.  Precise control flow reconstruction using Boolean logic , 2011, 2011 Proceedings of the Ninth ACM International Conference on Embedded Software (EMSOFT).

[5]  Rajeev Barua,et al.  Scalable variable and data type detection in a binary rewriter , 2013, PLDI.

[6]  Daniel Kästner,et al.  Generic control flow reconstruction from assembly code , 2002, LCTES/SCOPES '02.

[7]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Herbert Bos,et al.  Practical Context-Sensitive CFI , 2015, CCS.

[9]  Rajeev Barua,et al.  Static binary rewriting without supplemental information: Overcoming the tradeoff between coverage and correctness , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[10]  Barton P. Miller,et al.  Learning to Analyze Binary Computer Code , 2008, AAAI.

[11]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[12]  Barton P. Miller,et al.  Efficient, sensitivity resistant binary instrumentation , 2011, ISSTA '11.

[13]  Barton P. Miller,et al.  Anywhere, any-time binary instrumentation , 2011, PASTE '11.

[14]  Bart Demoen,et al.  On the Static Analysis of Indirect Control Transfers in Binaries , 2000, PDPTA.

[15]  Dmitry Kravchenko,et al.  Alternating Control Flow Reconstruction , 2012, VMCAI.

[16]  Thomas W. Reps,et al.  CodeSurfer/x86-A Platform for Analyzing x86 Executables , 2005, CC.

[17]  Barton P. Miller,et al.  Who Wrote This Code? Identifying the Authors of Program Binaries , 2011, ESORICS.

[18]  Zhendong Su,et al.  Constructing Precise Control Flow Graphs from Binaries , 2010 .

[19]  Barton P. Miller,et al.  Practical analysis of stripped binary code , 2005, CARN.

[20]  Barton P. Miller,et al.  Structured Binary Editing with a CFG Transformation Algebra , 2012, 2012 19th Working Conference on Reverse Engineering.

[21]  Barton P. Miller,et al.  Automated tracing and visualization of software security structure and properties , 2012, VizSec '12.

[22]  Stefan Bygde 2 What You See Is Not What You Execute , 2011 .

[23]  Dan Fleck,et al.  Securing applications with Dyninst , 2015, 2015 IEEE International Symposium on Technologies for Homeland Security (HST).

[24]  Felix Wolf,et al.  Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis , 2011, Euro-Par.

[25]  Cristina Cifuentes,et al.  Recovery of jump table case statements from binary code , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[26]  Henrik Theiling,et al.  Extracting safe and precise control flow from binaries , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[27]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[28]  Martin Schulz,et al.  Stack Trace Analysis for Large Scale Debugging , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[29]  Barton P. Miller,et al.  Labeling library functions in stripped binaries , 2011, PASTE '11.

[30]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[31]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[32]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[33]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[34]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[35]  SchulzMartin,et al.  Open|SpeedShop: An open source infrastructure for parallel performance analysis , 2008 .

[36]  Ross J. Anderson,et al.  Rendezvous: A search engine for binary code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[37]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[38]  Angelos D. Keromytis,et al.  Retrofitting Security in COTS Software with Binary Rewriting , 2011, SEC.

[39]  Jingyu Zhou,et al.  Detecting attacks that exploit application-logic errors through application-level auditing , 2004, 20th Annual Computer Security Applications Conference.

[40]  Andy King,et al.  BinSlayer: accurate comparison of binary executables , 2013, PPREW '13.

[41]  Fan Long,et al.  Automatic runtime error repair and containment via recovery shepherding , 2014, PLDI.

[42]  Gregory R. Andrews,et al.  Disassembly of executable code revisited , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[43]  Barton P. Miller,et al.  Detecting Code Reuse Attacks with a Model of Conformant Program Execution , 2014, ESSoS.

[44]  Philippe Herrmann,et al.  Refinement-Based CFG Reconstruction from Unstructured Programs , 2011, VMCAI.

[45]  Helmut Veith,et al.  Jakstab: A Static Analysis Platform for Binaries , 2008, CAV.

[46]  Barton P. Miller,et al.  Recovering the toolchain provenance of binary code , 2011, ISSTA '11.