Obfuscation resilient binary code reuse through trace-oriented programming

With the wide existence of binary code, it is desirable to reuse it in many security applications, such as malware analysis and software patching. While prior approaches have shown that binary code can be extracted and reused, they are often based on static analysis and face challenges when coping with obfuscated binaries. This paper introduces trace-oriented programming (TOP), a general framework for generating new software from existing binary code by elevating the low-level binary code to C code with templates and inlined assembly. Different from existing work, TOP gains benefits from dynamic analysis such as resilience against obfuscation and avoidance of points-to analysis. Thus, TOP can be used for malware analysis, especially for malware function analysis and identification. We have implemented a proof-of-concept of TOP and our evaluation results with a range of benign and malicious software indicate that TOP is able to reconstruct source code from binary execution traces in malware analysis and identification, and binary function transplanting.

[1]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[2]  Peter T. Breuer,et al.  Programming Research Group Decompilation: the Enumeration of Types and Grammars Decompilation: the Enumeration of Types and Grammars , 1992 .

[3]  Jong-Deok Choi,et al.  Flow-Insensitive Interprocedural Alias Analysis in the Presence of Pointers , 1994, LCPC.

[4]  Cristina Cifuentes,et al.  Reverse compilation techniques , 1994 .

[5]  Peter T. Breuer,et al.  Decompilation: the enumeration of types and grammars , 1994, TOPL.

[6]  Cristina Cifuentes,et al.  Decompilation of binary programs , 1995, Softw. Pract. Exp..

[7]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[8]  Saumya K. Debray,et al.  Alias analysis of executable code , 1998, POPL '98.

[9]  Donglin Liang,et al.  Efficient points-to analysis for whole-program analysis , 1999, ESEC/FSE-7.

[10]  Alan Mycroft,et al.  Type-Based Decompilation (or Program Reconstruction via Type Reconstruction) , 1999, ESOP.

[11]  Barton P. Miller,et al.  An empirical study of the robustness of Windows NT applications using random testing , 2000 .

[12]  Laurie Hendren,et al.  Decompiling Java Bytecode: Problems, Traps and Pitfalls , 2002, CC.

[13]  Tal Garfinkel,et al.  A Virtual Machine Introspection Based Architecture for Intrusion Detection , 2003, NDSS.

[14]  Mike Van Emmerik,et al.  Using a decompiler for real-world source recovery , 2004, 11th Working Conference on Reverse Engineering.

[15]  Tal Garfinkel,et al.  Understanding data lifetime via whole system simulation , 2004 .

[16]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[17]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[18]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[19]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[20]  Frederic T. Chong,et al.  Minos: Architectural support for protecting control data , 2006, TACO.

[21]  Wenke Lee,et al.  PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[22]  Heng Yin,et al.  Renovo: a hidden code extractor for packed executables , 2007, WORM '07.

[23]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[24]  Heng Yin,et al.  Dynamic Spyware Analysis , 2007, USENIX Annual Technical Conference.

[25]  Polyglot : Automatic Extraction of Protocol Format using Dynamic Binary Analysis , 2007 .

[26]  Somesh Jha,et al.  OmniUnpack: Fast, Generic, and Safe Unpacking of Malware , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[27]  Chris Hankin,et al.  Efficient field-sensitive pointer analysis of C , 2007, TOPL.

[28]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[29]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[30]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[31]  Vinod Yegneswaran,et al.  Eureka: A Framework for Enabling Static Malware Analysis , 2008, ESORICS.

[32]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[33]  Tzi-cker Chiueh,et al.  A Study of the Packer Problem and Its Solutions , 2008, RAID.

[34]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[35]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[36]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[37]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[38]  Jonathon T. Giffin,et al.  Automatic Reverse Engineering of Malware Emulators , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[39]  Christopher Krügel,et al.  Effective and Efficient Malware Detection at the End Host , 2009, USENIX Security Symposium.

[40]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[41]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[42]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[43]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[44]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[45]  Jonathon T. Giffin,et al.  2011 IEEE Symposium on Security and Privacy Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection , 2022 .

[46]  Gabriel Negreira Barbosa,et al.  Scientific but Not Academical Overview of Malware Anti-Debugging , Anti-Disassembly and Anti-VM Technologies , 2012 .

[47]  Yangchun Fu,et al.  Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection , 2012, 2012 IEEE Symposium on Security and Privacy.

[48]  Xiangyu Zhang,et al.  BISTRO: Binary Component Extraction and Embedding for Software Security Applications , 2013, ESORICS.

[49]  David Brumley,et al.  Native x86 Decompilation Using Semantics-Preserving Structural Analysis and Iterative Control-Flow Structuring , 2013, USENIX Security Symposium.

[50]  Yangchun Fu,et al.  EXTERIOR: using a dual-VM based external shell for guest-OS introspection, configuration, and recovery , 2013, VEE '13.