Internet-Scale File Analysis

Malicious file analysis is well beyond the days when creating simple hashes for binaries was sufficient. The use of malicious PDF, Office, and other files present a far more diverse threat than our defensive tools were originally designed to handle. Even PE32 executables have been turned into poliand meta-morphic binaries with layers of packing applied to hide from detection. To make matters worse, the sheer influx of files to analyze presents a meaningful logistical problem which becomes increasingly complex as analytic methods move from static to dynamic analysis. When the point in time problem is considered the fact that historical discoveries can be viewed differently in the light of new analytic techniques or information the problem seems all but intractable. To this end, we designed the Skald framework, a blueprint for future analytic systems. We leveraged this framework to develop TOTEM, a system which is capable of coordinating, orchestrating, and scaling malware analytics across multiple cloud providers and thousands of running instances. TOTEM makes it easy to add new capabilities and can intelligently segregate work based on features, such as filetype, analytic duration, and computational complexity. TOTEM supports dynamic analysis through DRAKVUF, a novel open-source dynamic malware analysis system which was designed specifically to achieve unparalleled scalability, while maintaining a high level of stealth and visibility into the executing sample. Building on the latest hardware virtualization extensions found in Intel processors and the Xen hypervisor, DRAKVUF remains completely hidden from the executing sample and requires no special software to be installed within the sandbox. Further addressing the problem of monitoring kernel-mode rootkits as well as userspace applications, DRAKVUF significantly raises the bar for evasive malware to remain undetected. This paper will discuss the design, implementation, and practical deployment of TOTEM and DRAKVUF to analyze tremendous numbers of binary files.

[1]  Wenke Lee,et al.  Ether: malware analysis via hardware virtualization extensions , 2008, CCS.

[2]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[3]  Trent Jaeger,et al.  Sprobes: Enforcing Kernel Code Integrity on the TrustZone Architecture , 2014, ArXiv.

[4]  Xin Wu,et al.  HDROP: Detecting ROP Attacks Using Performance Monitoring Counters , 2014, ISPEC.

[5]  Xuxian Jiang,et al.  Stealthy malware detection through vmm-based "out-of-the-box" semantic view reconstruction , 2007, CCS '07.

[6]  Wenke Lee,et al.  Secure and Flexible Monitoring of Virtual Machines , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[7]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[8]  Mike P. Papazoglou,et al.  Service oriented architectures: approaches, technologies and research issues , 2007, The VLDB Journal.

[9]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[10]  James S. Okolica,et al.  Extracting Forensic Artifacts from Windows O/S Memory , 2011 .

[11]  Wenke Lee,et al.  Lares: An Architecture for Secure Active Monitoring Using Virtualization , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  Claudia Eckert,et al.  Persistent Data-only Malware: Function Hooks without Code , 2014, NDSS.

[13]  Felix C. Freiling,et al.  Toward Automated Dynamic Malware Analysis Using CWSandbox , 2007, IEEE Secur. Priv..

[14]  Using Hardware Performance Events for Instruction-Level Monitoring on the x86 Architecture , 2012 .

[15]  Xiangyu Zhang,et al.  SPIDER: stealthy binary program instrumentation and debugging via hardware virtualization , 2013, ACSAC.

[16]  Kim-Kwang Raymond Choo,et al.  The cyber threat landscape: Challenges and future research directions , 2011, Comput. Secur..

[17]  Christopher Krügel,et al.  Efficient Detection of Split Personalities in Malware , 2010, NDSS.

[18]  Quan Chen,et al.  Hypervision Across Worlds: Real-time Kernel Protection from the ARM TrustZone Secure World , 2014, CCS.

[19]  Emilie M. Roth,et al.  Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis , 1999, Cognition, Technology & Work.

[20]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[21]  Mike P. Papazoglou,et al.  Service-oriented computing: concepts, characteristics and directions , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..