MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks

Abstract Identifying the family of malware can determine their malicious intent and attack patterns, which helps to efficiently analyze large numbers of malware variants. Methods based on traditional machine learning often require a lot of time and resources in feature engineering. Virtually all existing static analysis methods based on malware visualization are derived from grayscale images, while a single low-order feature representation may be detrimental to discovering hidden features in a malware family. Based on these problems, this paper proposes an effective malware classification framework (MalFCS) based on malware visualization and automated feature extraction. MalFCS includes mainly three modules: malware visualization, feature extraction, and classification. First, we visualize malware binaries as entropy graphs based on structural entropy. Second, we present a feature extractor based on deep convolutional neural networks to extract patterns shared by a family from entropy graphs automatically. Finally, we propose an SVM classifier to classify malware based on the extracted features. We evaluate the proposed MalFCS over two widely studied benchmark datasets, i.e., Malimg and Microsoft. Experimental results show that compared with the state-of-the-art methods, MalFCS can obtain excellent classification performance with accuracy of 0.997 and 1, respectively, achieving the state-of-the-art performance.

[1]  Daniel Gibert,et al.  A Hierarchical Convolutional Neural Network for Malware Classification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[2]  Abien Fred Agarap,et al.  Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification , 2017, ArXiv.

[3]  Mojtaba Vahidi-Asl,et al.  G3MD: Mining frequent opcode sub-graphs for metamorphic malware detection of existing families , 2018, Expert Syst. Appl..

[4]  B. S. Manjunath,et al.  SPAM: Signal Processing to Analyze Malware [Applications Corner] , 2016, IEEE Signal Processing Magazine.

[5]  Claudia Eckert,et al.  Empowering convolutional networks for malware classification and analysis , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[6]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[7]  Bo Yu,et al.  Automatic malware classification and new malware detection using machine learning , 2017, Frontiers of Information Technology & Electronic Engineering.

[8]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[9]  Debojyoti Dutta,et al.  MIGAN: Malware Image Synthesis Using GANs , 2019, AAAI.

[10]  David Clark,et al.  The arms race: Adversarial search defeats entropy used to detect malware , 2019, Expert Syst. Appl..

[11]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[12]  Irfan-Ullah Awan,et al.  CloudIntell: An intelligent malware detection system , 2017, Future Gener. Comput. Syst..

[13]  Mark Stamp,et al.  Structural entropy and metamorphic malware , 2013, Journal of Computer Virology and Hacking Techniques.

[14]  Shize Guo,et al.  How to Make Attention Mechanisms More Practical in Malware Classification , 2019, IEEE Access.

[15]  Lei Liu,et al.  Combining supervised and unsupervised learning for zero-day malware detection , 2013, 2013 Proceedings IEEE INFOCOM.

[16]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[17]  Zilei Wang,et al.  Weighted Channel Dropout for Regularization of Deep Convolutional Neural Network , 2019, AAAI.

[18]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[19]  Juan E. Tapiador,et al.  The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development , 2018, IEEE Transactions on Information Forensics and Security.

[20]  Feng Gu,et al.  A multi-level deep learning system for malware detection , 2019, Expert Syst. Appl..

[21]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[22]  Weiming Zhang,et al.  Dynamic Defense Strategy against Stealth Malware Propagation in Cyber-Physical Systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Arun Kumar Sangaiah,et al.  Classification of ransomware families with machine learning based on N-gram of opcodes , 2019, Future Gener. Comput. Syst..

[25]  Yu-Kun Lai,et al.  A New Learning Approach to Malware Classification Using Discriminative Feature Extraction , 2019, IEEE Access.

[26]  K. P. Soman,et al.  Robust Intelligent Malware Detection Using Deep Learning , 2019, IEEE Access.

[27]  Salvatore Cuomo,et al.  A machine learning approach for IoT cultural data , 2019, Journal of Ambient Intelligence and Humanized Computing.

[28]  Yang Wang,et al.  Malware Classification with Deep Convolutional Neural Networks , 2018, 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS).

[29]  Tyler Moore,et al.  Polymorphic Malware Detection Using Sequence Classification Methods , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[30]  Rajkumar Buyya,et al.  CloudEyes: Cloud‐based malware detection with reversible sketch for resource‐constrained internet of things (IoT) devices , 2017, Softw. Pract. Exp..

[31]  Salvatore Cuomo,et al.  Exploring Unsupervised Learning Techniques for the Internet of Things , 2020, IEEE Transactions on Industrial Informatics.

[32]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[33]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[34]  Salvatore Cuomo,et al.  Decision Making in IoT Environment through Unsupervised Learning , 2020, IEEE Intelligent Systems.

[35]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[36]  Ivan Sorokin,et al.  Comparing files using structural entropy , 2011, Journal in Computer Virology.

[37]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[38]  C. D. Jaidhar,et al.  Automated multi-level malware detection system based on reconstructed semantic view of executables using machine learning techniques at VMM , 2018 .