Convolutional neural networks and extreme learning machines for malware classification

Research in the field of malware classification often relies on machine learning models that are trained on high-level features, such as opcodes, function calls, and control flow graphs. Extracting such features is costly, since disassembly or code execution is generally required. In this paper, we conduct experiments to train and evaluate machine learning models for malware classification, based on features that can be obtained without disassembly or code execution. Specifically, we visualize malware samples as images and employ image analysis techniques using both two-dimensional images and one-dimensional vectors derived from images. We consider two machine learning techniques, namely, convolutional neural networks (CNN) and extreme learning machines (ELM). For images we find that ELMs can achieve accuracies on par with CNNs, yet ELM training requires less than 2% of the time needed to train a comparable CNN. We also find that ELMs and CNNs perform as well when trained on one-dimensional data as when trained on two-dimensional data. In this latter case, ELMs are faster to train than CNNs, but only by a relatively small factor as compared to image-based training.

[1]  Mark Stamp Deep Thoughts on Deep Learning , 2018 .

[2]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[3]  Mark Stamp,et al.  Transfer Learning for Image-Based Malware Classification , 2019, ICISSP.

[4]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[5]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[6]  Mark Stamp,et al.  A comparison of static, dynamic, and hybrid analysis for malware detection , 2015, Journal of Computer Virology and Hacking Techniques.

[7]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[8]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[9]  R. Vinayakumar,et al.  A hybrid deep learning image-based analysis for effective malware detection , 2019, J. Inf. Secur. Appl..

[10]  Minxia Luo,et al.  Ensemble extreme learning machine and sparse representation classification , 2016, J. Frankl. Inst..

[11]  Amaury Lendasse,et al.  High-Performance Extreme Learning Machines: A Complete Toolbox for Big Data Applications , 2015, IEEE Access.

[12]  Ashwini Mujumdar,et al.  Analysis of Signature-Based and Behavior-Based Anti-Malware Approaches , 2013 .

[13]  Wei Zhang,et al.  Exploring Feature Extraction and ELM in Malware Detection for Android Devices , 2015, ISNN.

[14]  Qin Zheng,et al.  Image-Based malware classification using ensemble of CNN architectures (IMCEC) , 2020, Comput. Secur..

[15]  Pedro Antonio Gutiérrez,et al.  MELM-GRBF: A modified version of the extreme learning machine for generalized radial basis function neural networks , 2011, Neurocomputing.

[16]  Hadis Karimipour,et al.  An improved two-hidden-layer extreme learning machine for malware hunting , 2020, Comput. Secur..

[17]  Anthony T. Chronopoulos,et al.  A new malware detection system using a high performance-ELM method , 2019, IDEAS.

[18]  Jian Xu,et al.  A similarity metric method of obfuscated malware using function-call graph , 2012, Journal of Computer Virology and Hacking Techniques.

[19]  Myeongsuk Pak,et al.  A review of deep learning in image recognition , 2017, 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT).

[20]  Amin Azmoodeh,et al.  Graph embedding as a new approach for unknown malware detection , 2017, Journal of Computer Virology and Hacking Techniques.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Mark Stamp,et al.  Deep Learning versus Gist Descriptors for Image-based Malware Classification , 2018, ICISSP.

[23]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[24]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[25]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[26]  Guang-Bin Huang,et al.  Trends in extreme learning machines: A review , 2015, Neural Networks.

[27]  Ali Hamzeh,et al.  A novel method for malware detection using audio signal processing techniques , 2016, 2016 Artificial Intelligence and Robotics (IRANOPEN).

[28]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[29]  Ali Hamzeh,et al.  Music classification as a new approach for malware detection , 2018, Journal of Computer Virology and Hacking Techniques.