Malfustection: Obfuscated Malware Detection and Malware Classification with Data Shortage by Combining Semi-Supervised and Contrastive Learning

With the advent of new technologies, using various formats of digital gadgets is becoming widespread. In today's world, where everyday tasks are inevitable without technology, this extensive use of computers paves the way for malicious activity. As a result, it is important to provide solutions to defend against these threats. Malware is one of the well-known and widely used means utilized for doing destructive activities by malicious attackers. Producing malware from scratch is somewhat difficult, so attackers tend to obfuscate existing malware and prepare it to become an unrecognizable program. Since creating new malware from an old one using obfuscation is a creative task, there are some drawbacks to identifying obfuscated malwares. In this research, we propose a solution to overcome this problem by converting the code to an image in the first step and then using a semi-supervised approach combined with contrastive learning. In this case, an obfuscation in the malware bytecode corresponds to an augmentation in the image. Hence, by utilizing meaningful augmentations, which simulate some obfuscation changes and combine them to generate complex ambiguity procedures, our proposed solution is able to construct, learn, and detect a wide range of obfuscations. This work addresses two issues: 1) malware classification despite the data deficiency and 2) obfuscated malware detection by training on non-obfuscated malwares. According to the results, the proposed method overcomes the data shortage problem in malware classification, as its accuracy is 90.1% when just 10% of data is used for training the model. Moreover, training on basic malwares without obfuscation achieved 96.21 percent accuracy in detecting obfuscated malware.

[1]  Huaping Liu,et al.  Understanding the Behaviour of Contrastive Loss , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[3]  Alan F. Smeaton,et al.  Contrastive Representation Learning: A Framework and Review , 2020, IEEE Access.

[4]  Jemal H. Abawajy,et al.  Visualization and deep-learning-based malware variant detection using OpCode-level features , 2021, Future Gener. Comput. Syst..

[5]  Jinfeng Yi,et al.  EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples , 2017, AAAI.

[6]  Javed Ahmed,et al.  Data augmentation based malware detection using convolutional neural networks , 2020, PeerJ Comput. Sci..

[7]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[8]  Yingli Tian,et al.  Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Rajesh Kumar,et al.  Analysis of ResNet and GoogleNet models for malware detection , 2018, Journal of Computer Virology and Hacking Techniques.

[10]  Richard E. Harang,et al.  SeqDroid: Obfuscated Android Malware Detection Using Stacked Convolutional and Recurrent Neural Networks , 2019, Deep Learning Applications for Cyber Security.

[11]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[12]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[13]  Yong Qi,et al.  Detecting Malware with an Ensemble Method Based on Deep Neural Network , 2018, Secur. Commun. Networks.

[14]  Bulent Yener,et al.  A survey on practical adversarial examples for malware classifiers , 2020, ArXiv.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[17]  Roberto Giacobazzi,et al.  Data augmentation and transfer learning to classify malware images in a deep learning context , 2021, Journal of Computer Virology and Hacking Techniques.

[18]  Jaswinder Singh,et al.  Challenges of Malware Analysis : Obfuscation Techniques , 2018 .

[19]  Vinita Verma,et al.  Detection of Malign and Benign PE Files Using Texture Analysis , 2020, ICISS.

[20]  Vinita Verma,et al.  Multiclass malware classification via first- and second-order texture statistics , 2020, Comput. Secur..

[21]  Qi Tian,et al.  Semi-Supervised Contrastive Learning With Similarity Co-Calibration , 2021, IEEE Transactions on Multimedia.

[22]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[23]  Eric Medvet,et al.  Impact of Code Obfuscation on Android Malware Detection based on Static and Dynamic Analysis , 2018, ICISSP.