Portable Document Format is a very powerful type of file to spread malware because it is needed by many people, this makes PDF malware not to be taken lightly. PDF files that have been embedded with malware can be Javascript, URL access, media that has been infected with malware, etc. With a variety of preventive measures can help to spread, for example in this study using the classification method between dangerous files or not. Two classification methods that have the highest accuracy value based on previous research are Support Vector Machine and Random Forest. There are 500 datasets consisting of 2 classes, namely malicious and not malicius and 21 malicius PDF features as material for the classification process. Based on the calculation of Confusion Matrix as a comparison of the results of the classification of the two methods, the results show that the Random Forest method has better results than Support Vector Machine even though its value is still not perfect.
[1]
Pavel Laskov,et al.
Detection of Malicious PDF Files Based on Hierarchical Document Structure
,
2013,
NDSS.
[2]
Kai Ming Ting,et al.
Confusion Matrix
,
2010,
Encyclopedia of Machine Learning and Data Mining.
[3]
Miles Arthur Munson.
Deep PDF parsing to extract features for detecting embedded malware.
,
2011
.
[4]
Angelos Stavrou,et al.
Malicious PDF detection using metadata and structural features
,
2012,
ACSAC '12.
[5]
Krisantus Sembiring.
PENERAPAN TEKNIK SUPPORT VECTOR MACHINE UNTUK PENDETEKSIAN INTRUSI PADA JARINGAN
,
2009
.
[6]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.
[7]
Didier Stevens.
Malicious PDF Documents Explained
,
2011,
IEEE Security & Privacy.