EvadePDF: Towards Evading Machine Learning Based PDF Malware Classifiers

There have been significant developments in the application of Machine Learning based classifiers for identifying malware camouflaging as benign files (our study is based on PDF files) in recent times like PDFRate. However, unlike other fields where statistical techniques are used, malware detection lacks the fundamental assumption in ML-based techniques that the training data represents the perspective input. Instead, malware can be designed to specifically break the ML classifiers as an anomaly. We present a thorough study and the results of our improvement over the implementation of one such prominent project EvadeML, which is a Genetic Programming based technique to evade ML-based malware classifiers. EvadeML has shown 100% success rate for two target PDF malware classifiers PDFRate and Hidost. We have modified the EvadeML to have a better evasion efficiency for another PDF malware classifier AnalyzePDF and found significant improvement over the EvadeML. We have also tested our modified approach for the PDFRate malware classifier and found 100% success rate as in the original EvadeML.