Malware family classification method based on static feature extraction

With the development of malicious code engineering, new malware samples carry variability and polymorphism, which makes the malware variants show an increasingly growing trend. Traditional signature-based detection methods can hardly detect such variants so that it is significant for the cyber security field to analyze and detect large-scale malware samples by means of machine learning. Based on 65,536 malware samples, this paper proposed a classification method of malware family on the basis of static feature extraction using features from three aspects including bytecode features, assembler code features and PE features. We designed a series of experiments to test the features we chose and compared eight classifiers to find a better one. After feature selection and feature fusion processes we finally achieved an F1 score of 93.56% by random forest classifier.