Malware identification using visualization images and deep learning

Abstract Currently, malware is one of the most serious threats to Internet security. In this paper we propose a malware classification algorithm that uses static features called MCSC (Malware Classification using SimHash and CNN) which converts the disassembled malware codes into gray images based on SimHash and then identifies their families by convolutional neural network. During this process, some methods such as multi-hash, major block selection and bilinear interpolation are used to improve the performance. Experimental results show that MCSC is very effective for malware family classification, even for those unevenly distributed samples. The classification accuracy can be 99.260% at best and 98.862% at average on a malware dataset of 10,805 samples which is higher than other compared algorithms. Moreover, for MCSC , on average, it just takes 1.41 s to recognize a new sample, which can meet the requirements in most of the practical applications.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Claudia Eckert,et al.  Empowering convolutional networks for malware classification and analysis , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[3]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[4]  InSeon Yoo,et al.  Visualizing windows executable viruses using self-organizing maps , 2004, VizSEC/DMSEC '04.

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[7]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[8]  Felix C. Freiling,et al.  Visual analysis of malware behavior using treemaps and thread graphs , 2009, 2009 6th International Workshop on Visualization for Cyber Security.

[9]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Aman Jantan,et al.  An approach for malware behavior identification and classification , 2011, 2011 3rd International Conference on Computer Research and Development.

[11]  Srinivas Mukkamala,et al.  Image visualization based malware detection , 2013, 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS).

[12]  Hajime Shimada,et al.  Malware classification method based on sequence of traffic flow , 2015, 2015 International Conference on Information Systems Security and Privacy (ICISSP).

[13]  Muhammad Abdul Qadir,et al.  Using hidden markov model for dynamic malware analysis: First impressions , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[14]  Zheng Qin,et al.  IRMD: Malware Variant Detection Using Opcode Image Recognition , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[15]  Qi Li,et al.  Android Malware Detection Based on Static Analysis of Characteristic Tree , 2015, 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[16]  Curtis B. Storlie,et al.  Graph-based malware detection using dynamic analysis , 2011, Journal in Computer Virology.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Lynn Margaret Batten,et al.  Function length as a tool for malware classification , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[19]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[20]  Eul Gyu Im,et al.  Malware analysis using visualized images and entropy graphs , 2014, International Journal of Information Security.

[21]  Igor Santos,et al.  OPEM: A Static-Dynamic Approach for Machine-Learning-Based Malware Detection , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[22]  Lorie M. Liebrock,et al.  Visualizing compiled executables for malware analysis , 2009, 2009 6th International Workshop on Visualization for Cyber Security.

[23]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[24]  Lei Zhang,et al.  A New Static Detection Method of Malicious Document Based on Wavelet Package Analysis , 2015, 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP).

[25]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[26]  Mohsen Soryani,et al.  Malware clustering using image processing hashes , 2015, 2015 9th Iranian Conference on Machine Vision and Image Processing (MVIP).

[27]  Tatsuya Mori,et al.  Discovering similar malware samples using API call topics , 2015, 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC).

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  KyoungSoo Han,et al.  Malware Analysis Using Visualized Image Matrices , 2014, TheScientificWorldJournal.

[30]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[31]  Eul Gyu Im,et al.  Malware classification method via binary content comparison , 2012, RACS.

[32]  Srinivas Mukkamala,et al.  Malware detection using assembly and API call sequences , 2011, Journal in Computer Virology.

[33]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[34]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.