Malware Visualization for Fine-Grained Classification

Due to the rapid rise of automated tools, the number of malware variants has increased dramatically, which poses a tremendous threat to the security of the Internet. Recently, some methods for quick analysis of malware have been proposed, but these methods usually require a large computational overhead and cannot classify samples accurately for large-scale and complex malware data set. Therefore, in this paper, we propose a new visualization method for characterizing malware globally and locally to achieve fast and effective fine-grained classification. We take a new approach to visualize malware as RGB-colored images and extract global features from the images. Gray-level co-occurrence matrix and color moments are selected to describe the global texture features and color features, respectively, which produces low-dimensional feature data to reduce the complexity of training model. Moreover, a series of special byte sequences are extracted from code sections and data sections of malware and are processed into feature vectors by Simhash as the local features. Finally, we merge the global features and local features to perform malware classification using random forest, K-nearest neighbor, and support vector machine. Experimental results show that our approach obtains the highest accuracy of 97.47% and the highest F-measure of 96.85% of 7087 samples from 15 families. Color features and the local features effectively assist in the classification based on texture features and enhance the F-measure by 3.4% and 1%, respectively. Overall, the combination of global features and local features can realize fine-grained malware classification with low computational cost.

[1]  Ali A. Ghorbani,et al.  Exploring network-based malware classification , 2011, 2011 6th International Conference on Malicious and Unwanted Software.

[2]  Jonghyun Kim,et al.  Improvement of malware detection and classification using API call sequence alignment and visualization , 2017, Cluster Computing.

[3]  Mohd Aizaini Maarof,et al.  Malware behavior image for malware variant identification , 2014, 2014 International Symposium on Biometrics and Security Technologies (ISBAST).

[4]  Baosheng Wang,et al.  Malware classification using gray-scale images and ensemble learning , 2016, 2016 3rd International Conference on Systems and Informatics (ICSAI).

[5]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[6]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[7]  KyoungSoo Han,et al.  Malware Analysis Using Visualized Image Matrices , 2014, TheScientificWorldJournal.

[8]  Eul Gyu Im,et al.  Malware analysis using visualized images and entropy graphs , 2014, International Journal of Information Security.

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Xudong Ma,et al.  Dynamic Android Malware Classification Using Graph-Based Representations , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[11]  Sergey Bratus,et al.  Automated mapping of large binary objects using primitive fragment type classification , 2010, Digit. Investig..

[12]  Joshua Saxe,et al.  Visualization of shared system call sequence relationships in large malware corpora , 2012, VizSec '12.

[13]  Ali A. Ghorbani,et al.  Automated malware classification based on network behavior , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[14]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Chen Li,et al.  Malware variant detection using similarity search over content fingerprint , 2014, The 26th Chinese Control and Decision Conference (2014 CCDC).

[16]  Sakir Sezer,et al.  N-opcode analysis for android malware classification and categorization , 2016, 2016 International Conference On Cyber Security And Protection Of Digital Services (Cyber Security).

[17]  Srinivas Mukkamala,et al.  Image visualization based malware detection , 2013, 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[18]  Chu-Sing Yang,et al.  An information retrieval approach for malware classification based on Windows API calls , 2013, 2013 International Conference on Machine Learning and Cybernetics.

[19]  Zheng Qin,et al.  Malware Variant Detection Using Opcode Image Recognition with Small Training Sets , 2016, 2016 25th International Conference on Computer Communication and Networks (ICCCN).

[20]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[21]  Sergey Bratus,et al.  A Visual Study of Primitive Binary Fragment Types , 2010 .

[22]  Katsumi Wasaki,et al.  Malware classification based on extracted API sequences using static analysis , 2012, AINTEC.

[23]  Wanlei Zhou,et al.  Control Flow-Based Malware VariantDetection , 2014, IEEE Transactions on Dependable and Secure Computing.

[24]  Felix C. Freiling,et al.  Visual analysis of malware behavior using treemaps and thread graphs , 2009, 2009 6th International Workshop on Visualization for Cyber Security.

[25]  Philip K. Chan,et al.  Scalable Function Call Graph-based Malware Classification , 2017, CODASPY.

[26]  Srinivas Mukkamala,et al.  Visualization techniques for efficient malware detection , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[27]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[28]  Cabau George,et al.  Malware Classification Based on Dynamic Behavior , 2016 .

[29]  Jin Kwak,et al.  Automatic malware mutant detection and group classification based on the n-gram and clustering coefficient , 2015, The Journal of Supercomputing.

[30]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[31]  Muhammad Abdul Qadir,et al.  Similarity-Based Malware Classification Using Hidden Markov Model , 2015, 2015 Fourth International Conference on Cyber Security, Cyber Warfare, and Digital Forensic (CyberSec).