A malware variants detection methodology with an opcode based feature method and a fast density based clustering algorithm

Malware is one of the most terrible and major security threats facing the Internet today. In practice, the most widely used malware detection method is static detection. Static detection is effective for many types of malware. Operation code (opcode) sequences is one of the most important malware features for static analysis. In this paper, our goal is to optimize the accuracy and performance based on opcode features. Due to the diversity of the operation code, resulting in a large dimensions of feature of the malware, which will lead to low performance. We propose an information entropy based feature extraction method to extract a few but very useful information as representation of malware instances. At the same time, because of the low performance of the machine learning algorithm and the large set of features in the training and detection phase. We propose a generic Fast Density-Based Clustering algorithm for fast and accurately clustering malware instances. And our experiments demonstrate that our automated malware variant detection methodology is able to achieve high accuracy with significant speedup comparing with the other state-of-art approaches.

[1]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[2]  Terran Lane,et al.  Improving malware classification: bridging the static/dynamic gap , 2012, AISec.

[3]  Curtis B. Storlie,et al.  Graph-based malware detection using dynamic analysis , 2011, Journal in Computer Virology.

[4]  Wanlei Zhou,et al.  Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware , 2013, IEEE Transactions on Computers.

[5]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[6]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[7]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[8]  Karthik Raman,et al.  Selecting Features to Classify Malware , 2012 .

[9]  Olatz Arbelaitz,et al.  Evaluation of Malware clustering based on its dynamic behaviour , 2008, AusDM.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Timo Hämäläinen,et al.  Detection of zero-day malware based on the analysis of opcode sequences , 2014, 2014 IEEE 11th Consumer Communications and Networking Conference (CCNC).

[12]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[13]  Wanlei Zhou,et al.  Control Flow-Based Malware VariantDetection , 2014, IEEE Transactions on Dependable and Secure Computing.

[14]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[15]  Dilip B. Kotak,et al.  GRIDBSCAN: GRId Density-Based Spatial Clustering of Applications with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.