Feature Collection and Selection in Malware Classification

In order to make up for the shortcomings of signature-based traditional classification methods, the supervised learning algorithms of machine learning and deep learning are gradually applied to malware detection and classification. Based on the Windows malware classification problem, we firstly introduce the collection techniques of different features. Then we discuss the impact of the different features from malware behavior selected on classification results. The results show that the fine-grained features are usually better than coarse-grained features, multi-features are better than single features under certain circumstances. Besides, the collection and training costs of static features are smaller than dynamic features. Finally, considering the factors of training time, complexity of feature collection and classification accuracy, we present our own views on the features that should be applied to malware classification issues in different situations.

[1]  Edward Raff,et al.  Learning the PE Header, Malware Detection with Minimal Domain Knowledge , 2017, AISec@CCS.

[2]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[3]  Vlado Keselj,et al.  N-gram-based detection of new malicious code , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[4]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[5]  Takeshi Yagi,et al.  Malware Detection with Deep Neural Network Using Process Behavior , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[6]  Konrad Rieck,et al.  Automatically Inferring Malware Signatures for Anti-Virus Assisted Attacks , 2017, AsiaCCS.