ORIGEN: Automatic Extraction of Offset-Revealing Instructions for Cross-Version Memory Analysis

Semantic gap is a prominent problem in raw memory analysis, especially in Virtual Machine Introspection (VMI) and memory forensics. For COTS software, common memory forensics and VMI tools rely on the so-called "data structure profiles" -- a mapping between the semantic variables and their relative offsets within the structure in the binary. Construction of such profiles requires the expert knowledge about the internal working of a specified software version. At most time, it requires considerable manual efforts, which often turns out to be a cumbersome process. In this paper, we propose a notion named "cross-version memory analysis", wherein our goal is to alleviate the process of profile construction for new versions of a software by transferring the knowledge from the model that has already been trained on its old version. To this end, we first identify such Offset Revealing Instructions (ORI) in a given software and then leverage the code search techniques to label ORIs in an unknown version of the same software. With labeled ORIs, we can localize the profile for the new version. We provide a proof-of-concept implementation called ORIGEN. The efficacy and efficiency of ORIGEN have been empirically verified by a number of softwares. The experimental results show that by conducting the ORI search within Windows XP SP0 and Linux 3.5.0, we can successfully recover the data structure profiles for Windows XP SP2, Vista, Win 7, and Linux 2.6.32, 3.8.0, 3.13.0, respectively. The systematical evaluation on 40 versions of OpenSSH demonstrates ORIGEN can achieve a precision of more than 90%. As a case study, we integrate ORIGEN into a VMI tool to automatically extract semantic information required for VMI. We develop two plugins to the Volatility memory forensic framework, one for OpenSSH session key extraction, the other for encrypted filesystem key extraction. Both of them can achieve the cross-version analysis by ORIGEN.

[1]  Yaniv David,et al.  Tracelet-based code search in executables , 2014, PLDI.

[2]  Dan Boneh,et al.  OpenConflict: Preventing Real Time Map Hacks in Online Games , 2011, 2011 IEEE Symposium on Security and Privacy.

[3]  Xuxian Jiang,et al.  Stealthy malware detection through vmm-based "out-of-the-box" semantic view reconstruction , 2007, CCS '07.

[4]  Heng Yin,et al.  Make it work, make it right, make it fast: building a platform-neutral whole-system dynamic binary analysis platform , 2014, ISSTA 2014.

[5]  Yangchun Fu,et al.  Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection , 2012, 2012 IEEE Symposium on Security and Privacy.

[6]  T. Dullien,et al.  Graph-based comparison of Executable Objects ( English Version ) , 2005 .

[7]  Zhongshu Gu,et al.  VCR: App-Agnostic Recovery of Photographic Evidence from Android Device Memory Images , 2015, CCS.

[8]  Christopher Krügel,et al.  Identifying Dormant Functionality in Malware Programs , 2010, 2010 IEEE Symposium on Security and Privacy.

[9]  Atul Prakash,et al.  Expose: Discovering Potential Binary Code Re-use , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[10]  David Brumley,et al.  Towards Automatic Software Lineage Inference , 2013, USENIX Security Symposium.

[11]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[12]  Christian Rossow,et al.  Leveraging semantic signatures for bug search in binary programs , 2014, ACSAC.

[13]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[14]  Heng Yin,et al.  MACE: high-coverage and robust memory analysis for commodity operating systems , 2014, ACSAC '14.

[15]  Thomas Dullien,et al.  Graph-based comparison of Executable Objects , 2005 .

[16]  William A. Arbaugh,et al.  FATKit: A framework for the extraction and analysis of digital forensic data from volatile system memory , 2006, Digit. Investig..

[17]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[18]  Christian Rossow,et al.  Cross-Architecture Bug Search in Binary Executables , 2015, 2015 IEEE Symposium on Security and Privacy.

[19]  Wenke Lee,et al.  Lares: An Architecture for Secure Active Monitoring Using Virtualization , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[20]  Ross J. Anderson,et al.  Rendezvous: A search engine for binary code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[21]  David Brumley,et al.  Blanket Execution: Dynamic Similarity Testing for Program Binaries and Components , 2014, USENIX Security Symposium.

[22]  Debin Gao,et al.  iBinHunt: Binary Hunting with Inter-procedural Control Flow , 2012, ICISC.

[23]  Tal Garfinkel,et al.  A Virtual Machine Introspection Based Architecture for Intrusion Detection , 2003, NDSS.

[24]  David Brumley,et al.  Automatically deriving pointer reference expressions from binary code for memory dump analysis , 2015, ESEC/SIGSOFT FSE.

[25]  Jonathon T. Giffin,et al.  2011 IEEE Symposium on Security and Privacy Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection , 2022 .

[26]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[27]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[28]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[29]  Brian Hay,et al.  Forensics examination of volatile system data using virtual introspection , 2008, OPSR.

[30]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[31]  Wenke Lee,et al.  Secure and Flexible Monitoring of Virtual Machines , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[32]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[33]  Alexander Aiken,et al.  Data-driven equivalence checking , 2013, OOPSLA.