Deep learning-aided runtime opcode-based Windows malware detection

Thousands of new malware codes are developed every day. Signature-based methods, which are employed by common malware detectors, are susceptible to code obfuscation and novel malware. In this paper, we present an alternative method for malware detection, which makes use of assembly opcode sequences obtained during runtime. First, for sequential opcode data, we utilize natural language processing and deep learning techniques to facilitate the extraction of deeper behavioral features. Due to these features, this method can be impervious to code obfuscation and effective against novel malware. Finally, these features are fed to various machine learning algorithms for classification. The experiments on a more class balanced dataset of 26869 samples demonstrated that MCC (Matthew’s correlation coefficient) score as high as 0.95 is achievable with this approach. The MCC score results for the experiments conducted on imbalanced and artificially balanced datasets are 0.81 and 0.83, respectively.

[1]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[2]  Adam Doupé,et al.  Deep Android Malware Detection , 2017, CODASPY.

[3]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[4]  Yong Qi,et al.  Detecting Malware with an Ensemble Method Based on Deep Neural Network , 2018, Secur. Commun. Networks.

[5]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[6]  Jong Hyuk Park,et al.  Malware classification algorithm using advanced Word2vec-based Bi-LSTM for ground control stations , 2020, Comput. Commun..

[7]  Erdogan Dogdu,et al.  Malware classification using deep learning methods , 2018, ACM Southeast Regional Conference.

[8]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[9]  Aditya P. Mathur,et al.  A Survey of Malware Detection Techniques , 2007 .

[10]  Гарнаева Мария Александровна,et al.  Kaspersky security Bulletin 2013 , 2014 .

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  P. V. Shijo,et al.  Integrated Static and Dynamic Analysis for Malware Detection , 2015 .

[14]  Mark Stamp,et al.  Malware Detection Using Dynamic Birthmarks , 2016, IWSPA@CODASPY.

[15]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[16]  Yong Qi,et al.  LSTM-Based Hierarchical Denoising Network for Android Malware Detection , 2018, Secur. Commun. Networks.

[17]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[18]  Jongsub Moon,et al.  Malware-Detection Method with a Convolutional Recurrent Neural Network Using Opcode Sequences , 2020, Inf. Sci..

[19]  Sakir Sezer,et al.  The Effects of Traditional Anti-Virus Labels on Malware Detection Using Dynamic Runtime Opcodes , 2017, IEEE Access.

[20]  Qingyu Mao,et al.  TinyDroid: A Lightweight and Efficient Model for Android Malware Detection and Classification , 2018, Mob. Inf. Syst..

[21]  Sakir Sezer,et al.  Dynamic Analysis of Malware using Run Time Opcodes , 2017 .

[22]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[23]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[24]  Claudia Eckert,et al.  Empowering convolutional networks for malware classification and analysis , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[25]  Igor Popov,et al.  Malware detection using machine learning based on word2vec embeddings of machine code instructions , 2017, 2017 Siberian Symposium on Data Science and Engineering (SSDSE).

[26]  Khairuddin Omar,et al.  A Survey on Malware Analysis Techniques: Static, Dynamic, Hybrid and Memory Analysis , 2018, International Journal on Advanced Science, Engineering and Information Technology.

[27]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[28]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[29]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[30]  Ali Hamzeh,et al.  A survey on heuristic malware detection techniques , 2013, The 5th Conference on Information and Knowledge Technology.

[31]  Maria Osborn,et al.  Malware Detection Techniques , 2015 .

[32]  Claudia Eckert,et al.  Deep Learning for Classification of Malware System Call Sequences , 2016, Australasian Conference on Artificial Intelligence.