Malware Detection Using Black-Box Neural Method

Because of the great loss and damage caused by malwares, malware detection has become a central issue of computer security. It has to be fast and very accurate. To develop suitable methods on needs very good quality benchmarks. One such benchmark is the Microsoft Kaggle malware challenge system run in 2015. Since then over 50 papers were published on this system. The best result were achieved with complex feature engineering. In this work we analyze the black-box neural method and what is novel analyze its results against the Microsoft Kaggle malware challenge benchmark. It is tempting to use convolution neural networks for malware analysis following the great success with analysis of images. Even the use of balanced classes and drop-out convergence does not beat XGBoost with feature engineering, although some room for improvement exists. The situation is similar to that for language analysis. The language is much more hierarchical than image, and apparently malware is too. The malware analysis still awaits optimal neural network architecture.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Czeslaw Jedrzejek,et al.  Comparison of the Novel Classification Methods on the Reuters-21578 Corpus , 2018, MISSI.

[3]  Hyrum S. Anderson,et al.  EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models , 2018, ArXiv.

[4]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[5]  Yong Qi,et al.  Detecting Malware with an Ensemble Method Based on Deep Neural Network , 2018, Secur. Commun. Networks.

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Mansour Ahmadi,et al.  Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification , 2015, CODASPY.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Bhavani M. Thuraisingham,et al.  A scalable multi-level feature extraction technique to detect malicious executables , 2007, Inf. Syst. Frontiers.

[10]  Katsumi Wasaki,et al.  Malware classification based on extracted API sequences using static analysis , 2012, AINTEC.

[11]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[12]  Xiang Zhang,et al.  Byte-Level Recursive Convolutional Auto-Encoder for Text , 2018, ArXiv.

[13]  Marek Krcál,et al.  Deep Convolutional Malware Classifiers Can Learn from Raw Executables and Labels Only , 2018, International Conference on Learning Representations.

[14]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[15]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[16]  Heejo Lee,et al.  Packer Detection for Multi-Layer Executables Using Entropy Analysis , 2017, Entropy.

[17]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.