Using hidden markov model for dynamic malware analysis: First impressions

Malware developers are coming up with new techniques to escape malware detection. Furthermore, with the common availability of malware construction kits and metamorphic virus generators, creation of obfuscated malware has become a child's play. This has made the task of anti-malware industry a challenging one, who need to analyze tens of thousands of new malware samples everyday in order to provide defense against the malware threat. The silver lining is that most of the malware generated by such means is different only syntactically, and hence techniques employing dynamic analysis and behavior modeling can be effectively used for classifying malware. In this paper we have proposed a malware classification scheme based on Hidden Markov Models using system calls as observed symbols. Our approach combines the powerful statistical pattern analysis capability of Hidden Markov Models with the proven capacity of system calls as discriminating dynamic features for countering malware obfuscation. Testing the proposed technique on system call logs of real malware shows that it has the potential of effectively classifying unknown malware into known classes.

[1]  Carsten Willems,et al.  A Malware Instruction Set for Behavior-Based Analysis , 2010, Sicherheit.

[2]  Mark Stamp,et al.  Hunting for metamorphic engines , 2006, Journal in Computer Virology.

[3]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[4]  Terran Lane,et al.  Improving malware classification: bridging the static/dynamic gap , 2012, AISec.

[5]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Philip K. Chan,et al.  Proceedings of the 2004 ACM workshop on Visualization and data mining for computer security , 2004, CCS 2004.

[7]  Mark Stamp,et al.  Profile hidden Markov models and metamorphic virus detection , 2009, Journal in Computer Virology.

[8]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[9]  InSeon Yoo,et al.  Visualizing windows executable viruses using self-organizing maps , 2004, VizSEC/DMSEC '04.

[10]  Bazara I. A. Barry,et al.  Enhancing the Detection of Metamorphic Malware using Call Graphs , 2015 .

[11]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[12]  Quan Qian,et al.  Research on Hidden Markov Model for System Call Anomaly Detection , 2007, PAISI.

[13]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[14]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[15]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[16]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[17]  R Saradha,et al.  Malware Analysis using Profile Hidden Markov Models and Intrusion Detection in a Stream Learning Setting , 2014 .

[18]  Jean-Pierre Seifert,et al.  pBMDS: a behavior-based malware detection system for cellphone devices , 2010, WiSec '10.

[19]  Md. Rafiqul Islam,et al.  Classification of malware based on integrated static and dynamic features , 2013, J. Netw. Comput. Appl..

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  Mark Stamp,et al.  Hidden Markov models for malware classification , 2015, Journal of Computer Virology and Hacking Techniques.

[22]  Tomohiro Yamamura,et al.  A Driver Behavior Recognition Method Based on a Driver Model Framework , 2000 .

[23]  Douglas S. Reeves,et al.  Fast malware classification by automated behavioral graph matching , 2010, CSIIRW '10.

[24]  Mário A. T. Figueiredo,et al.  Similarity-based classification of sequences using hidden Markov models , 2004, Pattern Recognit..

[25]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[26]  Felix C. Freiling,et al.  Toward Automated Dynamic Malware Analysis Using CWSandbox , 2007, IEEE Secur. Priv..

[27]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.