An Adversarial Machine Learning Method Based on OpCode N-grams Feature in Malware Detection

Machine learning has become an important method in malware detection. However, due to the weakness of machine learning models, a large number of researches related to adversarial machine learning has emerged. At present, the researches about adversarial machine learning mainly focus on the image and speech recognition. In the field of malware detection, because the feature modification can easily damage the integrity and functionality of the code, it is usually through adding noise such as garbage instructions to achieve fooling the malware detection model. In this paper, we propose an adversarial machine learning method on malware detection model based on OpCode n-grams feature. We first collect a large number of malicious code and normal code data sets, and use TF-IDF to extract OpCode n-grams with different n values from the data set. Then we train three malware detection models based on OpCode n-grams. Under the premise of comprehensive consideration of efficiency, accuracy and interpretability, we select XGBoost as the adversarial feature extraction model and extract adversarial features. Finally, in order to verify the accuracy of the extracted features, we conduct an adversarial machine learning experiment. The experimental results show that the adversarial method proposed in this paper can completely fool the machine learning detection model.

[1]  Edward Raff,et al.  An investigation of byte n-gram features for malware classification , 2018, Journal of Computer Virology and Hacking Techniques.

[2]  Jon Barker,et al.  Malware Detection by Eating a Whole EXE , 2017, AAAI Workshops.

[3]  Konstantin Berlin,et al.  Deep neural network based malware detection using two dimensional binary program features , 2015, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE).

[4]  Le Song,et al.  Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection , 2018 .

[5]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[6]  Edward Raff,et al.  Learning the PE Header, Malware Detection with Minimal Domain Knowledge , 2017, AISec@CCS.

[7]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[8]  Igor Santos,et al.  OPEM: A Static-Dynamic Approach for Machine-Learning-Based Malware Detection , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[9]  Ying Tan,et al.  Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN , 2017, DMBD.

[10]  Zhuoqing Morley Mao,et al.  Automated Classification and Analysis of Internet Malware , 2007, RAID.

[11]  Curtis B. Storlie,et al.  Graph-based malware detection using dynamic analysis , 2011, Journal in Computer Virology.

[12]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[13]  Benny Pinkas,et al.  Adversarial Examples on Discrete Sequences for Beating Whole-Binary Malware Detection , 2018, ArXiv.

[14]  Rama Chellappa,et al.  UPSET and ANGRI : Breaking High Performance Image Classifiers , 2017, ArXiv.

[15]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[16]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[18]  Sayak Ray,et al.  Malware detection using machine learning based analysis of virtual memory access patterns , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[19]  Yanfang Ye,et al.  DL 4 MD : A Deep Learning Framework for Intelligent Malware Detection , 2016 .

[20]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[21]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[22]  Niklas Lavesson,et al.  Detecting scareware by mining variable length instruction sequences , 2011, 2011 Information Security for South Africa.