Malware detection using hidden markov model based on markov blanket feature selection method

In general we categorize all malicious codes that potentially can harm a single or network of computers into malware groups. With great progress in enhancing virus development kit and various kind of malware appeared today, and increasing in number of web networks users, malwares spreading out rapidly in all aspect of computers systems. The main approach for finding and detecting malware today, is signature base methods. But with progress in developing metamorphic malware today, these technique lost their performance to detecting malwares. In this research by using machine learning methods and combining them with n-gram model and use statistical analysis, a new approach introduced for detection malwares. Using Markov blanket method as feature selection technique, reduced size of features approximately 86% in average. Then numbers of sequences produced to training hidden Markov model. Trained HMM showed great accuracy about 90% to detecting and classifying malware and benign files.

[1]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[2]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[3]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  Jeffrey O. Kephart,et al.  Biologically Inspired Defenses Against Computer Viruses , 1995, IJCAI.

[6]  Ulrich Ultes-Nitsche,et al.  Non-signature based virus detection , 2006, Journal in Computer Virology.

[7]  Sami Khuri,et al.  ANALYSIS AND DETECTION OF METAMORPHIC COMPUTER VIRUSES , 2006 .

[8]  Bruce Potter,et al.  The effectiveness of anti-malware tools , 2009 .

[9]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[10]  Mourad Debbabi,et al.  Static analysis of binary code to isolate malicious behaviors , 1999, Proceedings. IEEE 8th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WET ICE'99).

[11]  Douglas S. Reeves,et al.  Deriving common malware behavior through graph clustering , 2011, ASIACCS '11.

[12]  Diomidis Spinellis,et al.  Reliable identification of bounded-length viruses is NP-complete , 2003, IEEE Trans. Inf. Theory.

[13]  Concha Bielza,et al.  Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: An application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) , 2012, J. Biomed. Informatics.

[14]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[15]  Daniel Bilar,et al.  Opcodes as predictor for malware , 2007, Int. J. Electron. Secur. Digit. Forensics.

[16]  Kal Renganathan Sharma Bioinformatics: Sequence Alignment and Markov Models , 2008 .

[17]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[18]  Yoseba K. Penya,et al.  Idea: Opcode-Sequence-Based Malware Detection , 2010, ESSoS.