A Comparative Analysis of Classifiers in the Recognition of Packed Executables

Although the packing of executable binaries can be adopted with legitimate intent such as intellectual property protection and size reduction, malware developers utilize those tools to obfuscate their code and thus increase the complexity of static analysis. In order to recognize packed executables, the BinStat application was proposed. It is based on two major steps: the feature extraction, which involves the calculation of statistics and information theory properties from a given binary; and the classification, which adopts a decision tree learned from input features of packed and unpacked binaries previously known in order to classify new executables. The results obtained proved the effectiveness of the tool, but the choice of using only one classifier is arguably a weakness that we chose to improve on the present study. For that end, we rebuilt the training and test datasets and selected the following six classifiers to our analyses: classification and regression trees, random forest, k-nearest neighbors, naive Bayes, neural network and support vector machines. Our results show that the original decision tree algorithm adopted in BinStat (C5.0) is not the best choice for the proposed problem. Indeed, random forest, k-nearest neighbors and support vector machines achieved the best predictive performances.

[1]  Tzi-cker Chiueh,et al.  A Study of the Packer Problem and Its Solutions , 2008, RAID.

[2]  Axel Legay,et al.  Effective, efficient, and robust packing detection and classification , 2019, Comput. Secur..

[3]  Kil Park,et al.  BinStat Tool for Recognition of Packed Executables , 2011 .

[4]  Murillo G. Carneiro,et al.  What's the Next Move? Learning Player Strategies in Zoom Poker Games , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[5]  Shou-Ching Hsiao,et al.  Malware-Detection Model Using Learning-Based Discovery of Static Features , 2018, 2018 IEEE Conference on Application, Information and Network Security (AINS).

[6]  Pavol Zavarsky,et al.  The study of evasion of packed PE from static detection , 2012, World Congress on Internet Security (WorldCIS-2012).

[7]  Muhammad Zubair Shafiq,et al.  Malware detection using statistical analysis of byte-level file content , 2009, CSI-KDD '09.

[8]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[9]  Nirwan Ansari,et al.  Revealing Packed Malware , 2008, IEEE Security & Privacy.

[10]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .