Detecting Malware Variants by Byte Frequency

In order to make lots of new malwares fast and cheaply, attacker can simply modify the existing malwares based on their binary files to produce new ones, malware variants. Malware variants refer to all the new malwares manually or automatically produced from any existing malware. However, such simple approach to produce malwares can change signatures of the original malware so that the new malware variants can confuse and bypass most of popular signature-based anti-malware tools. In this paper we propose a novel byte frequency based detecting model (BFBDM) to deal with the malware variants identification issue. The byte frequency of software refers to the frequency of the different unsigned bytes in the corresponding binary file. In order to implement BFBDM, two metrics, the distance and the similarity between the suspicious software and base sample, a known malware, are defined and calculated. According to the experimental results, we found out that if the distance is low and the similarity is high, the suspicious software is a variant of the selected malware with very high probability. The primary experimental results show that our model is efficient and effective for the identification of malware variants, especially for the manual variant.