论文信息 - Malware Detection in PDF Files using Machine Learning

Malware Detection in PDF Files using Machine Learning

In this report we present how we used machine learning techniques to detect malicious behaviours in PDF files.At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF, we forged to make them look like clean ones. We first proposed a very naive attack, that was easily stopped by the establishment of a threshold. We also implemented a gradientdescent attack to evade this SVM. This attack was almost 100% successful. In order to fix this problem, we provided counter-measures to the latter attack. A more elaborated features selection, and the use of a threshold, allowed us to stop up to 99.99% of these attacks.Finally, using adversarial learning techniques, we were able to prevent gradient descent attacks by iteratively feeding the SVM with malicious forged PDF. We found that after 3 iterations, every gradient-descent forged PDF were detected, completely preventing the attack.

[1] Jarle Kittilsen,et al. Detecting malicious PDF documents , 2011 .

[2] Giovanni Felici,et al. Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[3] Giorgio Giacinto,et al. A Pattern Recognition System for Malicious PDF Files Detection , 2012, MLDM.

[4] Knut Borg. Real time detection and analysis of PDF-files , 2013 .

[5] José Torres,et al. Malicious PDF Documents Detection using Machine Learning Techniques - A Practical Approach with Cloud Computing Applications , 2018, ICISSP.

[6] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[7] Fabio Roli,et al. Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[8] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.