File Entropy Signal Analysis Combined With Wavelet Decomposition for Malware Classification

With the rapid development of the Internet, malware variants have increased exponentially, which poses a key threat to cyber security. Persistent efforts have been made to classify malware variants, but there are still many challenges, including the incapacity to deal with various malware variants belonging to similar families, the problem of time and resource consuming, etc. This paper proposes a novel method, called Malware Entropy Sequences Reflect the Family (MESRF), to improve the classification of malware based on the entropy sequences features. In prior research, entropy demonstrated good performance in many areas. First, the global features of the signals were extracted from the entropy sequences by some statistical methods. Next, some local features (i.e. structural entropy features) are extracted based on the discrete wavelet decomposition algorithm and vectorized by the Bag-of-words model, endowing it the high accuracy of malware classification. To evaluate our method, we conducted numerous experiments on the malware datasets with more than 20,000 samples. Through experiments, MESRF showed superiority comparing with other malware classification models, and the accuracy and ROC of the method even could reach 99.83% and 99.98% respectively on the malimg dataset.

[1]  Aziz Mohaisen,et al.  AMAL: High-fidelity, behavior-based automated malware analysis and classification , 2014, Comput. Secur..

[2]  Farhan Ullah,et al.  A Cross-Platform Malware Variant Classification based on Image Representation , 2019, KSII Trans. Internet Inf. Syst..

[3]  Jonghyun Kim,et al.  Improvement of malware detection and classification using API call sequence alignment and visualization , 2017, Cluster Computing.

[4]  Farhan Ullah,et al.  Identification of malicious code variants based on image visualization , 2019, Comput. Electr. Eng..

[5]  Heejo Lee,et al.  Entropy analysis to classify unknown packing algorithms for malware detection , 2016, International Journal of Information Security.

[6]  Edward Raff,et al.  An Alternative to NCD for Large Sequences, Lempel-Ziv Jaccard Distance , 2017, KDD.

[7]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[8]  Roberto Baldoni,et al.  Survey on the Usage of Machine Learning Techniques for Malware Analysis , 2017, Comput. Secur..

[9]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[10]  Fabio Ramos,et al.  Malicious Software Classification Using VGG16 Deep Neural Network’s Bottleneck Features , 2018 .

[11]  Vinod Yegneswaran,et al.  A comparative assessment of malware classification using binary texture analysis and dynamic analysis , 2011, AISec '11.

[12]  Zheng Qin,et al.  Malware Variant Detection Using Opcode Image Recognition with Small Training Sets , 2016, 2016 25th International Conference on Computer Communication and Networks (ICCCN).

[13]  Mansour Ahmadi,et al.  Microsoft Malware Classification Challenge , 2018, ArXiv.

[14]  Hui Li,et al.  SMASH: A Malware Detection Method Based on Multi-Feature Ensemble Learning , 2019, IEEE Access.

[15]  Jinjun Chen,et al.  Detection of Malicious Code Variants Based on Deep Learning , 2018, IEEE Transactions on Industrial Informatics.

[16]  Mark Stamp,et al.  Clustering for malware classification , 2017, Journal of Computer Virology and Hacking Techniques.

[17]  Peng Wang,et al.  ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification , 2020, Applied Sciences.

[18]  Christos Kalloniatis,et al.  Machine Learning and Images for Malware Detection and Classification , 2017, PCI.

[19]  Chen Li,et al.  Malware variant detection using similarity search over content fingerprint , 2014, The 26th Chinese Control and Decision Conference (2014 CCDC).

[20]  Ali Hamzeh,et al.  Visual malware detection using local malicious pattern , 2018, Journal of Computer Virology and Hacking Techniques.

[21]  Xuan Zhao,et al.  Wavelet decomposition of software entropy reveals symptoms of malicious code , 2016, J. Innov. Digit. Ecosyst..

[22]  Gerardo Canfora,et al.  An HMM and structural entropy based detector for Android malware: An empirical study , 2016, Comput. Secur..

[23]  Liang Liu,et al.  Capturing the symptoms of malicious code in electronic documents by file's entropy signal combined with Machine learning , 2019, Appl. Soft Comput..

[24]  KyoungSoo Han,et al.  Malware Analysis Using Visualized Image Matrices , 2014, TheScientificWorldJournal.

[25]  Yuval Elovici,et al.  A Chronological Evaluation of Unknown Malcode Detection , 2009, PAISI.