Machine Learning Methods for Malware Detection and Classification

Malware detection is an important factor in the security of the computer systems. However, currently utilized signature-based methods cannot provide accurate detection of zero-day attacks and polymorphic viruses. That is why the need for machine learning-based detection arises. The purpose of this work was to determine the best feature extraction, feature representation, and classification methods that result in the best accuracy when used on the top of Cuckoo Sandbox. Specifically, k-Nearest-Neighbors, Decision Trees, Support Vector Machines, Naive Bayes and Random Forest classifiers were evaluated. The dataset used for this study consistsed of the 1156 malware files of 9 families of different types and 984 benign files of various formats. This work presents recommended methods for machine learning based malware classification and detection, as well as the guidelines for its implementation. Moreover, the study performed can be useful as a base for further research in the field of malware analysis with machine learning methods.

[1]  Ranzhe Jing,et al.  A View of Support Vector Machines Algorithm on Classification Problems , 2010, 2010 International Conference on Multimedia Communications.

[2]  Anca L. Ralescu,et al.  A Study of Android Malware Detection Techniques and Machine Learning , 2016, MAICS.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  A. Kohn [Computer viruses]. , 1989, Harefuah.

[5]  Arun K. Pujari,et al.  N-gram analysis for computer virus detection , 2006, Journal in Computer Virology.

[6]  Erkki Oja,et al.  Classification with learning k-nearest neighbors , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[7]  Andrew Lee,et al.  Heuristic Analysis – Detecting Unknown Viruses , 2007 .

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Gilles Louppe,et al.  Understanding Random Forests , 2015 .

[10]  安藤 寛,et al.  Cross-Validation , 1952, Encyclopedia of Machine Learning and Data Mining.

[11]  Paul A. Watters,et al.  Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures , 2011, AusDM.

[12]  Philip H. Swain,et al.  Purdue e-Pubs , 2022 .

[13]  Sakir Sezer,et al.  Evolution of ransomware , 2018, IET Networks.

[14]  William N. Venables,et al.  An Introduction To R , 2004 .

[15]  Mineko Izumi,et al.  Introduction to the Feature , 2008 .