Windows PE Malware Detection Using Ensemble Learning

In this Internet age, there are increasingly many threats to the security and safety of users daily. One of such threats is malicious software otherwise known as malware (ransomware, Trojans, viruses, etc.). The effect of this threat can lead to loss or malicious replacement of important information (such as bank account details, etc.). Malware creators have been able to bypass traditional methods of malware detection, which can be time-consuming and unreliable for unknown malware. This motivates the need for intelligent ways to detect malware, especially new malware which have not been evaluated or studied before. Machine learning provides an intelligent way to detect malware and comprises two stages: feature extraction and classification. This study suggests an ensemble learning-based method for malware detection. The base stage classification is done by a stacked ensemble of fully-connected and one-dimensional convolutional neural networks (CNNs), whereas the end-stage classification is done by a machine learning algorithm. For a meta-learner, we analyzed and compared 15 machine learning classifiers. For comparison, five machine learning algorithms were used: naive Bayes, decision tree, random forest, gradient boosting, and AdaBoosting. The results of experiments made on the Windows Portable Executable (PE) malware dataset are presented. The best results were obtained by an ensemble of seven neural networks and the ExtraTrees classifier as a final-stage classifier.

[1]  Nikolaj Goranin,et al.  Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data , 2020 .

[2]  Rahil Hosseini,et al.  A state-of-the-art survey of malware detection approaches using data mining techniques , 2018, Human-centric Computing and Information Sciences.

[3]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[4]  Bezawada Bruhadeshwar,et al.  Signature Generation and Detection of Malware Families , 2008, ACISP.

[5]  ABM.Adnan Azmee,et al.  Performance Analysis of Machine Learning Classifiers for Detecting PE Malware , 2019 .

[6]  Li Ma,et al.  A Method for Windows Malware Detection Based on Deep Learning , 2020, Journal of Signal Processing Systems.

[7]  Hui Li,et al.  SMASH: A Malware Detection Method Based on Multi-Feature Ensemble Learning , 2019, IEEE Access.

[8]  Robertas Damaševičius,et al.  Hybrid Malware Classification Method Using Segmentation-Based Fractal Texture Analysis and Deep Convolution Neural Network Features , 2020, Applied Sciences.

[9]  Brent Byunghoon Kang,et al.  AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification , 2021, Inf. Sci..

[10]  Joseph Gardiner,et al.  On the Security of Machine Learning in Malware C&C Detection , 2016, ACM Comput. Surv..

[11]  Bingcai Chen,et al.  End-to-end malware detection for android IoT devices using deep learning , 2020, Ad Hoc Networks.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Lior Rokach,et al.  Ensemble learning: A survey , 2018, WIREs Data Mining Knowl. Discov..

[14]  Abdelouahid Derhab,et al.  Android Malware Detection using Deep Learning on API Method Sequences , 2017, ArXiv.

[15]  Yanfang Ye,et al.  DroidDelver: An Android Malware Detection System Using Deep Belief Network Based on API Call Blocks , 2016, WAIM Workshops.

[16]  Qianmu Li,et al.  Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection , 2020, IEEE Transactions on Information Forensics and Security.

[17]  Sanjay Misra,et al.  An Experimental Approach to Unravel Effects of Malware on System Network Interface , 2020 .

[18]  Shengwei Tian,et al.  AMalNet: A deep learning framework based on graph convolutional networks for malware detection , 2020, Comput. Secur..

[19]  Wei Wei,et al.  Ensemble machine learning approaches for webshell detection in Internet of things environments , 2020, Trans. Emerg. Telecommun. Technol..

[20]  Filip Zatloukal,et al.  Malware Detection Based on Multiple PE Headers Identification and Optimization for Specific Types of Files , 2017, J. Adv. Eng. Comput..

[21]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[22]  Ondrej Pluskal Behavioural malware detection using efficient SVM implementation , 2015, RACS.

[23]  Aman Jantan,et al.  Comprehensive Review of Artificial Neural Network Applications to Pattern Recognition , 2019, IEEE Access.

[24]  Qiao Tian,et al.  Homology analysis of malware based on ensemble learning and multifeatures , 2019, PloS one.

[25]  Zhenlong Yuan,et al.  DroidDetector: Android Malware Characterization and Detection Using Deep Learning , 2016 .

[26]  Meltem Ozsoy,et al.  EnsembleHMD: Accurate Hardware Malware Detectors with Specialized Ensemble Classifiers , 2020, IEEE Transactions on Dependable and Secure Computing.

[27]  Erdogan Dogdu,et al.  Malware classification using deep learning methods , 2018, ACM Southeast Regional Conference.

[28]  Feng Gu,et al.  A multi-level deep learning system for malware detection , 2019, Expert Syst. Appl..

[29]  Gunnar Rätsch,et al.  Regularizing AdaBoost , 1998, NIPS.

[30]  Wu Yang,et al.  Malware Detection Based on Multi-level and Dynamic Multi-feature Using Ensemble Learning at Hypervisor , 2020, Mob. Networks Appl..

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[33]  K. P. Soman,et al.  Robust Intelligent Malware Detection Using Deep Learning , 2019, IEEE Access.

[34]  R. Vinayakumar,et al.  A hybrid deep learning image-based analysis for effective malware detection , 2019, J. Inf. Secur. Appl..

[35]  Roberto Baldoni,et al.  Survey on the Usage of Machine Learning Techniques for Malware Analysis , 2017, Comput. Secur..

[36]  Robertas Damaševičius,et al.  Android Malware Detection: A Survey , 2018, ICAI.

[37]  Yanhui Du,et al.  Android Malware Detection Based on a Hybrid Deep Learning Model , 2020, Secur. Commun. Networks.

[38]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[39]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Daniel Gibert,et al.  The rise of machine learning for detection and classification of malware: Research developments, trends and challenges , 2020, J. Netw. Comput. Appl..

[41]  Irfan-Ullah Awan,et al.  The World of Malware: An Overview , 2018, 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud).

[42]  Mourad Debbabi,et al.  Network malware classification comparison using DPI and flow packet headers , 2015, Journal of Computer Virology and Hacking Techniques.