Approximate Disassembly using Dynamic Programming

Most commercial anti-virus software uses signature based techniques to detect whether a file is infected by a virus or not. However, signature based detection systems are unable to detect metamorphic viruses, since such viruses change their internal structure from generation to generation. Previous work has shown that hidden Markov models (HMMs) can be used to detect metamorphic viruses. In this technique, the code is disassembled and the resulting opcode sequences are used for training and detection. Due to the disassembly step, this process is not efficient enough to use when a decision has to be made in real time. In this project, we explore whether dynamic programming can be used to speed up the process of disassembling, with minimal loss of accuracy. Dynamic programming is generally used to solve problems having two key attributes: optimal substructure and overlapping sub problems. During each iteration our algorithm reads part of the input stream from the executable file and determines assembly instructions, thus dividing problems into sub problems. We have created a score matrix representing digraphs of the most common opcode instructions and we have implanted a dynamic program based on this scoring matrix. For various file sizes, we determine the time taken by our dynamic program and we show that our approach is significantly faster than a standard disassembler (OllyDbg). Finally, we analyze the accuracy of our results. ii Acknowledgements I would like to thank Dr. Mark Stamp for guiding and encouraging me throughout the project. I would also like to thank my committee members, Dr. Sami Khuri and Dr. Robert Chun for helping me during the project.

[1]  Mark Stamp,et al.  Practical Detection of Metamorphic Computer Viruses , 2008 .

[2]  C T Dinardo,et al.  Computers and security , 1986 .

[3]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[4]  Matt Pietrek,et al.  An in-depth look into the win32 portable executable le format , 2002 .

[5]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[6]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[7]  C. Pollett,et al.  DETECTING METAMORPHIC VIRUSES USING PROFILE HIDDEN MARKOV MODELS , 2007 .

[8]  R. Bellman Dynamic programming. , 1957, Science.

[9]  Michael Kuperberg,et al.  Markov Models , 2019, Earthquake Statistical Analysis through Multi-state Modeling.

[10]  Evgenios Konstantinou,et al.  Metamorphic Virus: Analysis and Detection , 2008 .

[11]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[12]  Gregory R. Andrews,et al.  Disassembly of executable code revisited , 2002, Ninth Working Conference on Reverse Engineering, 2002. Proceedings..

[13]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[14]  Peter Szor,et al.  HUNTING FOR METAMORPHIC , 2001 .

[15]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[16]  P. Gács,et al.  Algorithms , 1992 .

[17]  Priti Desai Towards an Undetectable Computer Virus , 2008 .

[18]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.