Cross-Architecture Bug Search in Binary Executables

With the general availability of closed-source software for various CPU architectures, there is a need to identify security-critical vulnerabilities at the binary level to perform a vulnerability assessment. Unfortunately, existing bug finding methods fall short in that they i) require source code, ii) only work on a single architecture (typically x86), or iii) rely on dynamic analysis, which is inherently difficult for embedded devices. In this paper, we propose a system to derive bug signatures for known bugs. We then use these signatures to find bugs in binaries that have been deployed on different CPU architectures (e.g., x86 vs. MIPS). The variety of CPU architectures imposes many challenges, such as the incomparability of instruction set architectures between the CPU models. We solve this by first translating the binary code to an intermediate representation, resulting in assignment formulas with input and output variables. We then sample concrete inputs to observe the I/O behavior of basic blocks, which grasps their semantics. Finally, we use the I/O behavior to find code parts that behave similarly to the bug signature, effectively revealing code parts that contain the bug. We have designed and implemented a tool for cross architecture bug search in executables. Our prototype currently supports three instruction set architectures (x86, ARM, and MIPS) and can find vulnerabilities in buggy binary code for any of these architectures. We show that we can find Heart bleed vulnerabilities, regardless of the underlying software instruction set. Similarly, we apply our method to find backdoors in closed source firmware images of MIPS- and ARM-based routers.

[1]  Luca Bruno,et al.  AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems' Firmwares , 2014, NDSS.

[2]  Thierry Lavoie,et al.  Uncovering access control weaknesses and flaws with security-discordant software clones , 2013, ACSAC.

[3]  David A. Wagner,et al.  Finding User/Kernel Pointer Bugs with Type Inference , 2004, USENIX Security Symposium.

[4]  Priya Narasimhan,et al.  Binary Function Clustering Using Semantic Hashes , 2012, 2012 11th International Conference on Machine Learning and Applications.

[5]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  David Brumley,et al.  Automatic exploit generation , 2014, CACM.

[7]  Ashish Goel,et al.  Dimension independent similarity computation , 2012, J. Mach. Learn. Res..

[8]  David A. Wagner,et al.  This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Detecting Format String Vulnerabilities with Type Qualifiers , 2001 .

[9]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[10]  Christian S. Collberg,et al.  K-gram based software birthmarks , 2005, SAC '05.

[11]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[12]  David Brumley,et al.  AEG: Automatic Exploit Generation , 2011, NDSS.

[13]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[14]  Atul Prakash,et al.  Expose: Discovering Potential Binary Code Re-use , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[15]  Christian Rossow,et al.  Leveraging semantic signatures for bug search in binary programs , 2014, ACSAC.

[16]  Aurélien Francillon,et al.  A Large-Scale Analysis of the Security of Embedded Firmwares , 2014, USENIX Security Symposium.

[17]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[18]  David Brumley,et al.  Blanket Execution: Dynamic Similarity Testing for Program Binaries and Components , 2014, USENIX Security Symposium.

[19]  Debin Gao,et al.  iBinHunt: Binary Hunting with Inter-procedural Control Flow , 2012, ICISC.

[20]  Benjamin Livshits,et al.  Finding Security Vulnerabilities in Java Applications with Static Analysis , 2005, USENIX Security Symposium.

[21]  András Frank,et al.  On Kuhn's Hungarian Method—A tribute from Hungary , 2005 .

[22]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .

[23]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[24]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[25]  Herbert Bos,et al.  Memory Errors: The Past, the Present, and the Future , 2012, RAID.

[26]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[27]  David Brumley,et al.  BYTEWEIGHT: Learning to Recognize Functions in Binary Code , 2014, USENIX Security Symposium.

[28]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[29]  Muhammad Zubair Shafiq,et al.  Using spatio-temporal information in API calls with machine learning algorithms for malware detection , 2009, AISec '09.

[30]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[31]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[32]  Michael Tüxen,et al.  Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS) Heartbeat Extension , 2012, RFC.