OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary

Recovering variables and data structure information from stripped binary is a prominent challenge in binary program analysis. While various state-of-the-art techniques are effective in specific settings, such effectiveness may not generalize. This is mainly because the problem is inherently uncertain due to the information loss in compilation. Most existing techniques are deterministic and lack a systematic way of handling such uncertainty. We propose a novel probabilistic technique for variable and structure recovery. Random variables are introduced to denote the likelihood of an abstract memory location having various types and structural properties such as being a field of some data structure. These random variables are connected through probabilistic constraints derived through program analysis. Solving these constraints produces the posterior probabilities of the random variables, which essentially denote the recovery results. Our experiments show that our technique substantially outperforms a number of state-of-the-art systems, including IDA, Ghidra, Angr, and Howard. Our case studies demonstrate the recovered information improves binary code hardening and binary decompilation.

[1]  Bart Demoen,et al.  On the Static Analysis of Indirect Control Transfers in Binaries , 2000, PDPTA.

[2]  Christopher Krügel,et al.  Toward the Analysis of Embedded Firmware through Automated Re-hosting , 2019, RAID.

[3]  Dawson R. Engler,et al.  From uncertainty to belief: inferring the specification within , 2006, OSDI '06.

[4]  Blase Ur,et al.  Rethinking Access Control and Authentication for the Home Internet of Things (IoT) , 2018, USENIX Security Symposium.

[5]  Nicola Dell,et al.  "Is my phone hacked?" Analyzing Clinical Computer Security Interventions with Survivors of Intimate Partner Violence , 2019, Proc. ACM Hum. Comput. Interact..

[6]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[7]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[8]  Farhaan Fowze,et al.  FirmUSB: Vetting USB Device Firmware using Domain Informed Symbolic Execution , 2017, CCS.

[9]  Xi Chen,et al.  An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries , 2016, USENIX Security Symposium.

[10]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[11]  Sarfraz Khurshid,et al.  Directed incremental symbolic execution , 2011, PLDI '11.

[12]  Jens B. Schmitt,et al.  Crowd-GPS-Sec: Leveraging Crowdsourcing to Detect and Localize GPS Spoofing Attacks , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[13]  Dominik Stoffel,et al.  Speculative disassembly of binary code , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[14]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[16]  David Brumley,et al.  Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring , 2013, USENIX Security Symposium.

[17]  Nikolai Tillmann,et al.  DyTa: dynamic symbolic execution guided with static verification results , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[18]  Alfred V. Aho,et al.  Principles of Compiler Design , 1977 .

[19]  David Barber,et al.  The junction tree algorithm , 2011 .

[20]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[21]  José Nelson Amaral,et al.  Function outlining and partial inlining , 2005, 17th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'05).

[22]  Marius Popa Binary Code Disassembly for Reverse Engineering , 2012 .

[23]  Thomas W. Reps,et al.  WYSINWYX: What you see is not what you eXecute , 2005, TOPL.

[24]  Thomas Schuster,et al.  Automated Detection, Exploitation, and Elimination of Double-Fetch Bugs using Modern CPU Features , 2017, AsiaCCS.

[25]  Benjamin Livshits,et al.  Merlin: specification inference for explicit information flow problems , 2009, PLDI '09.

[26]  Fabio Roli,et al.  Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries , 2019, ITASEC.

[27]  Xuxian Jiang,et al.  Mapping kernel objects to enable systematic integrity checking , 2009, CCS.

[28]  Xiangyu Zhang,et al.  PMP: Cost-effective Forced Execution with Probabilistic Memory Pre-planning , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[29]  Li Ping,et al.  The Factor Graph Approach to Model-Based Signal Processing , 2007, Proceedings of the IEEE.

[30]  David A. Wagner,et al.  Control-Flow Bending: On the Effectiveness of Control-Flow Integrity , 2015, USENIX Security Symposium.

[31]  Sarfraz Khurshid,et al.  Learning to Accelerate Symbolic Execution via Code Transformation , 2018, ECOOP.

[32]  Kevin W. Hamlen,et al.  Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics , 2018, NDSS.

[33]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[34]  Baowen Xu,et al.  Python probabilistic type inference with natural language support , 2016, SIGSOFT FSE.

[35]  Aristide Fattori,et al.  When hardware meets software: a bulletproof solution to forensic memory acquisition , 2012, ACSAC '12.

[36]  David Brumley,et al.  BAP: A Binary Analysis Platform , 2011, CAV.

[37]  James Reed Osprey , 2003 .

[38]  Carlo Ghezzi,et al.  Run-time efficient probabilistic model checking , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[39]  Mathias Payer,et al.  T-Fuzz: Fuzzing by Program Transformation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[40]  Mathias Payer,et al.  RetroWrite: Statically Instrumenting COTS Binaries for Fuzzing and Sanitization , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[41]  Christopher Krügel,et al.  Static Disassembly of Obfuscated Binaries , 2004, USENIX Security Symposium.

[42]  Henrik Theiling,et al.  Extracting safe and precise control flow from binaries , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[43]  Matthew B. Dwyer,et al.  Probabilistic symbolic execution , 2012, ISSTA 2012.

[44]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[45]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[46]  Yi Sun,et al.  Probabilistic Disassembly , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[47]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[48]  Marcelo d'Amorim,et al.  Iterative distribution-aware sampling for probabilistic symbolic execution , 2015, ESEC/SIGSOFT FSE.

[49]  Herbert Bos,et al.  VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[50]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[51]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[52]  Aditya V. Nori,et al.  Probabilistic, modular and scalable inference of typestate specifications , 2011, PLDI '11.

[53]  Donguk Kim,et al.  Prime+Count: Novel Cross-world Covert Channels on ARM TrustZone , 2018, ACSAC.

[54]  Brent Byunghoon Kang,et al.  Lord of the x86 Rings: A Portable User Mode Privilege Separation Architecture on x86 , 2018, CCS.

[55]  Miguel Castro,et al.  Dynamically checking ownership policies in concurrent c/c++ programs , 2010, POPL '10.

[56]  Wei You,et al.  BDA: practical dependence analysis for binary executables by unbiased whole-program path sampling and per-path abstract interpretation , 2019, Proc. ACM Program. Lang..

[57]  Nikolai Tillmann,et al.  Fitness-guided path exploration in dynamic symbolic execution , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[58]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[59]  Christopher Krügel,et al.  SOK: (State of) The Art of War: Offensive Techniques in Binary Analysis , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[60]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[61]  Emiliano De Cristofaro,et al.  FAIR: fuzzy-based aggregation providing in-network resilience for real-time wireless sensor networks , 2009, WiSec '09.

[62]  Eric M. Schulte,et al.  Datalog Disassembly , 2019, USENIX Security Symposium.

[63]  Chenxiong Qian,et al.  RAZOR: A Framework for Post-deployment Software Debloating , 2019, USENIX Security Symposium.

[64]  Kang G. Shin,et al.  Anatomization and Protection of Mobile Apps' Location Privacy Threats , 2015, USENIX Security Symposium.

[65]  Mizuhito Ogawa,et al.  A Hybrid Approach for Control Flow Graph Construction from Binary Code , 2013, 2013 20th Asia-Pacific Software Engineering Conference (APSEC).

[66]  Junfeng Yang,et al.  NEUZZ: Efficient Fuzzing with Neural Program Smoothing , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[67]  Alastair F. Donaldson,et al.  Language-Level Symmetry Reduction for Probabilistic Model Checking , 2009, 2009 Sixth International Conference on the Quantitative Evaluation of Systems.

[68]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.