Based on Multi-features and Clustering Ensemble Method for Automatic Malware Categorization

Automatic malware categorization plays an important role in combating the current large volume of malware and aiding the corresponding forensics. Generally, there are lot of sample information could be extracted with the static tools and dynamic sandbox for malware analysis. Combine these obtained features effectively for further analysis would provides us a better understanding. On the other hand, most current works on malware analysis are based on single category of machine learning algorithm to categorize samples. However, different clustering algorithms have their own strengths and weaknesses. And then, how to combine the merits of the multiple categories of features and algorithms to further improve the analysis result is very critical. In this paper, we propose a novel scalable malware analysis framework to exploit the complementary nature of different features and algorithms to optimally integrate their results. By using the concept of clustering ensemble, our system combines partitions from individual category of feature and algorithm to obtain better quality and robustness. Our system composed of the following three parts: (1) extract multiple categories of static and dynamic features; (2) use the k-means and hierarchical clustering algorithms to construct the base clustering; (3) proposed an efficient method based on mixture model clustering ensemble to conduct an effective clustering analysis. We have evaluated our method on two malware datasets, namely the Microsoft malware dataset and our own malware dataset which contained 10868 and 53760 samples respectively. Our experiment results show that our method could categorize malware with better quality and robustness. Also, our method has certain advantages in the system run time and memory consumption compared with the state-of-the art malware analysis works

[1]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[2]  B. S. Manjunath,et al.  SigMal: a static signal processing based malware triage , 2013, ACSAC.

[3]  Christopher Krügel,et al.  BareCloud: Bare-metal Analysis-based Evasive Malware Detection , 2014, USENIX Security Symposium.

[4]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[5]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[6]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[7]  Giovanni Vigna,et al.  MalGene: Automatic Extraction of Malware Analysis Evasion Signature , 2015, CCS.

[8]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[9]  Kang G. Shin,et al.  DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles , 2013, ACSAC.

[10]  Christopher Krügel,et al.  Exploring Multiple Execution Paths for Malware Analysis , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[11]  Engin Kirda,et al.  Exploiting diverse observation perspectives to get insights on the malware landscape , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[12]  Arun Lakhotia,et al.  Identifying Shared Software Components to Support Malware Forensics , 2014, DIMVA.

[13]  Saumya Debray,et al.  A Generic Approach to Automatic Deobfuscation of Executable Code , 2015, 2015 IEEE Symposium on Security and Privacy.

[14]  Terran Lane,et al.  Improving malware classification: bridging the static/dynamic gap , 2012, AISec.

[15]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[16]  Christopher Krügel,et al.  A Static, Packer-Agnostic Filter to Detect Similar Malware Samples , 2012, DIMVA.

[17]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[18]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[19]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[20]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[21]  Bhavani M. Thuraisingham,et al.  A scalable multi-level feature extraction technique to detect malicious executables , 2007, Inf. Syst. Frontiers.