Profile hidden Markov models and metamorphic virus detection

Metamorphic computer viruses “mutate” by changing their internal structure and, consequently, different instances of the same virus may not exhibit a common signature. With the advent of construction kits, it is easy to generate metamorphic strains of a given virus. In contrast to standard hidden Markov models (HMMs), profile hidden Markov models (PHMMs) explicitly account for positional information. In principle, this positional information could yield stronger models for virus detection. However, there are many practical difficulties that arise when using PHMMs, as compared to standard HMMs. PHMMs are widely used in bioinformatics. For example, PHMMs are the most effective tool yet developed for finding family related DNA sequences. In this paper, we consider the utility of PHMMs for detecting metamorphic virus variants generated from virus construction kits. PHMMs are generated for each construction kit under consideration and the resulting models are used to score virus and non-virus files. Our results are encouraging, but several problems must be resolved for the technique to be truly practical.

[1]  Colin Haynes,et al.  Computer Viruses, Worms, Data Diddlers, Killer Programs, and Other Threats to Your System: What They Are, how They Work, and how to Defend Your PC or Mainframe , 1989 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Lisa J. Carnahan,et al.  Anti-virus tools and techniques for computer systems , 1995 .

[4]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[5]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[6]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[7]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[8]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[9]  Anders Krogh,et al.  Chapter 4 - An introduction to hidden Markov models for biological sequences , 1998 .

[10]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[11]  Peter Szor,et al.  HUNTING FOR METAMORPHIC , 2001 .

[12]  Nancy Forbes,et al.  Computer Immune Systems , 2005 .

[13]  Mark Stamp,et al.  Information security - principles and practice , 2005 .

[14]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[15]  Andrew Walenstein,et al.  Normalizing Metamorphic Malware Using Term Rewriting , 2006, 2006 Sixth IEEE International Workshop on Source Code Analysis and Manipulation.

[16]  Mattia Monga,et al.  Using Code Normalization for Fighting Self-Mutating Malware , 2006, ISSSE.

[17]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[18]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[19]  Eric Filiol,et al.  Metamorphism, Formal Grammars and Undecidable Code Mutation , 2007 .

[20]  Scott McGhee PAIRWISE ALIGNMENT OF METAMORPHIC COMPUTER VIRUSES , 2007 .

[21]  Ludovic Mé,et al.  Code obfuscation techniques for metamorphic viruses , 2008, Journal in Computer Virology.

[22]  Matemática Prim's Algorithm , 2010 .