AMAL: High-Fidelity, Behavior-Based Automated Malware Analysis and Classification

This paper introduces AMAL, an operational automated and behavior-based malware analysis and labeling (classification and clustering) system that addresses many limitations and shortcomings of the existing academic and industrial systems. AMAL consists of two sub-systems, AutoMal and MaLabel. AutoMal provides tools to collect low granularity behavioral artifacts that characterize malware usage of the file system, memory, network, and registry, and does that by running malware samples in virtualized environments. On the other hand, MaLabel uses those artifacts to create representative features, use them for building classifiers trained by manually-vetted training samples, and use those classifiers to classify malware samples into families similar in behavior. AutoMal also enables unsupervised learning, by implementing multiple clustering algorithms for samples grouping. An evaluation of both AutoMal and MaLabel based on medium-scale (4,000 samples) and large-scale datasets (more than 115,000 samples)—collected and analyzed by AutoMal over 13 months—show AMAL’s effectiveness in accurately characterizing, classifying, and grouping malware samples. MaLabel achieves a precision of 99.5 % and recall of 99.6 % for certain families’ classification, and more than 98 % of precision and recall for unsupervised clustering. Several benchmarks, costs estimates and measurements highlight and support the merits and features of AMAL.

[1]  Felix C. Freiling,et al.  TrumanBox: Improving Dynamic Malware Analysis by Emulating the Internet , 2011, SSS.

[2]  Frank Piessens,et al.  Fides: selectively hardening software application components against kernel-level or process-level malware , 2012, CCS '12.

[3]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[4]  Wenke Lee,et al.  Detecting Malware Domains at the Upper DNS Hierarchy , 2011, USENIX Security Symposium.

[5]  Christopher Krügel,et al.  JACKSTRAWS: Picking Command and Control Connections from Bot Traffic , 2011, USENIX Security Symposium.

[6]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[7]  Leyla Bilge,et al.  Exposure: A Passive DNS Analysis Service to Detect and Report Malicious Domains , 2014, TSEC.

[8]  Joris Kinable,et al.  Malware classification based on call graph clustering , 2010, Journal in Computer Virology.

[9]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[10]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[11]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[12]  Herbert Bos,et al.  Prudent Practices for Designing Malware Experiments: Status Quo and Outlook , 2012, 2012 IEEE Symposium on Security and Privacy.

[13]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[14]  Leyla Bilge,et al.  Disclosure: detecting botnet command and control servers through large-scale NetFlow analysis , 2012, ACSAC '12.

[15]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[16]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[17]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[18]  Fang Yu,et al.  Populated IP addresses: classification and applications , 2012, CCS.

[19]  Thorsten Holz,et al.  As the net churns: Fast-flux botnet observations , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[20]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[21]  Jingjing Yao,et al.  Malicious Executables Classification Based on Behavioral Factor Analysis , 2010, 2010 International Conference on e-Education, e-Business, e-Management and e-Learning.

[22]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[23]  Marco Ramilli,et al.  Multi-stage delivery of malware , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[24]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..