Evaluation of Feature and Signature based Training Approaches for Malware Classification using Autoencoders

Malware analysis has become a critical and notable area of research importance due to rapid growth in the development and application of internet based systems. Recent advances in artificial intelligence (AI) particularly with data mining enabled the implementation of AI based malware classification and detection systems. AI based malware analysis systems are predominantly signature based and are built on available malware datasets. This paper tries to evaluate the capability of a feature based malware classification using autoencoders. In so doing, this paper presents a new approach for creating a synthetic malware dataset based on signature and features which could be used to train and test both traditional and artificial intelligence based malware detection systems. Various experiments are carried out using autoencoders training on feature based and signature based datasets and tested on a synthetic dataset. The experiments also carried out with multiple datasets and topologies. The experiment results show that the feature based training is proved to be efficient for synthetic, signature and feature based datasets compared to signature based approach. Feature based stacked autoencoders (5-layered) is able to achieve a classification accuracy of 95.6% more than 11.6% when compared with the signature based system which could achieve only 84.6%.

[1]  Mahmood Yousefi-Azar,et al.  Autoencoder-based feature learning for cyber security applications , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[2]  Xin Li,et al.  DeepAM: a heterogeneous deep learning framework for intelligent malware detection , 2018, Knowledge and Information Systems.

[3]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yanfang Ye,et al.  Deep4MalDroid: A Deep Learning Framework for Android Malware Detection Based on Linux Kernel System Call Graphs , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[5]  Aziz Makandar,et al.  Malware analysis and classification using Artificial Neural Network , 2015, 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15).

[6]  Feng Gu,et al.  A multi-level deep learning system for malware detection , 2019, Expert Syst. Appl..

[7]  Bhavani M. Thuraisingham,et al.  A Hybrid Model to Detect Malicious Executables , 2007, 2007 IEEE International Conference on Communications.

[8]  Elisa Bertino,et al.  How Deep Learning Is Making Information Security More Intelligent , 2019, IEEE Security & Privacy.

[9]  Marco Morana,et al.  Malware Detection through Low-level Features and Stacked Denoising Autoencoders , 2018, ITASEC.

[10]  Justin M. Beaver,et al.  A learning system for discriminating variants of malicious network traffic , 2013, CSIIRW '13.

[11]  Razvan Pascanu,et al.  Malware classification with recurrent networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Seyed Reza Shahamiri,et al.  A Deep Autoencoder approach for Speaker Identification , 2017, ICSPS 2017.

[13]  Ajit Narayanan,et al.  Transpositional neurocryptography using Deep Learning , 2017, ICIT 2017.

[14]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[15]  Ruili Wang,et al.  Speaker identification features extraction methods: A systematic review , 2017, Expert Syst. Appl..

[16]  Nathan S. Netanyahu,et al.  DeepSign: Deep learning for automatic malware signature generation and classification , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[17]  Pasquale Malacaria,et al.  Malware Detection Using 1-Dimensional Convolutional Neural Networks , 2019, 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[18]  M. Huss,et al.  A primer on deep learning in genomics , 2018, Nature Genetics.

[19]  Noreen Jamil,et al.  A Deep Neural Network Approach for Classification of Watermarked and Non-watermarked Images , 2018 .

[20]  Yanfang Ye,et al.  DL 4 MD : A Deep Learning Framework for Intelligent Malware Detection , 2016 .

[21]  Sung-Bae Cho,et al.  Zero-day malware detection using transferred generative adversarial networks based on deep autoencoders , 2018, Inf. Sci..

[22]  Sreenivas Sremath Tirumala,et al.  Hierarchical Data Classification Using Deep Neural Networks , 2015, ICONIP.

[23]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..