Transfer Learning for Image-Based Malware Classification

In this paper, we consider the problem of malware detection and classification based on image analysis. We convert executable files to images and apply image recognition using deep learning (DL) models. To train these models, we employ transfer learning based on existing DL models that have been pre-trained on massive image datasets. We carry out various experiments with this technique and compare its performance to that of an extremely simple machine learning technique, namely, k-nearest neighbors (\kNN). For our k-NN experiments, we use features extracted directly from executables, rather than image analysis. While our image-based DL technique performs well in the experiments, surprisingly, it is outperformed by k-NN. We show that DL models are better able to generalize the data, in the sense that they outperform k-NN in simulated zero-day experiments.

[1]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[3]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[4]  Mark Stamp,et al.  Deep Learning versus Gist Descriptors for Image-based Malware Classification , 2018, ICISSP.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Mark Stamp,et al.  Exploring Hidden Markov Models for Virus Analysis: A Semantic Approach , 2013, 2013 46th Hawaii International Conference on System Sciences.

[7]  Mark Stamp,et al.  Structural entropy and metamorphic malware , 2013, Journal of Computer Virology and Hacking Techniques.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mark Stamp,et al.  Introduction to Machine Learning with Applications in Information Security , 2017 .

[10]  Juan Caballero,et al.  The MALICIA dataset: identification and analysis of drive-by download operations , 2014, International Journal of Information Security.

[11]  Mark Stamp,et al.  A comparison of static, dynamic, and hybrid analysis for malware detection , 2015, Journal of Computer Virology and Hacking Techniques.

[12]  Mark Stamp,et al.  Support vector machines and malware detection , 2016, Journal of Computer Virology and Hacking Techniques.

[13]  Mark Stamp,et al.  Chi-squared distance and metamorphic virus detection , 2013, Journal of Computer Virology and Hacking Techniques.