Statistical signatures for fast filtering of instruction-substituting metamorphic malware

Introducing program variations via metamorphic transformations is one of the methods used by malware authors in order to help their programs slip past defenses. A method is presented for rapidly deciding whether or not an input program is likely to be a variant of a given metamorphic program. The method is defined for the prominent class of metamorphic engines that work by probabilistically selecting instruction-substituting program transformations. A model of the probabilistic engine is used to predictthe expected distribution of instruction forms for different generations ofvariants. These predicted distributions form a type of "statistical signature" for the output of the metamorphic engines. A classifier is defined based on distance between the observed and the predicted instruction form distributions. A case study using the W32.Evol virus shows the classifier can distinguish between malicious samples from multiple generations. The classification method may be useful for practical malware detection by serving as an inexpensive filter to avoid more in-depth analyses where they are unnecessary