A Novel Approach to Malware Detection using Static Classification

Malware, commonly called computer virus, is one of the top security threats to the computer systems around the globe. These are evolving at a very rapid pace and are continually finding new ways to exploit and infect the systems of various enterprises and businesses. Malwares use different techniques to camouflage themselves to make their lifetime longer. In this paper, we present a simple technique based on static features extracted from Windows PE files. The features used are not only extracted from the header part of the malware but also from the payload i.e. body of malware. The static features used are a combination of Function Call Frequency and Opcode Frequency for differentiating malwares from clean files. This combination of features set makes it a new approach for malware detection which provides an accuracy of 97% for a dataset of 1,230 executables files including 800 malware and 430 cleanwares. For classification purpose, we use machine learning algorithms available in WEKA library. Based on the results obtained, we conclude that both features considered in this work play a significant role in distinguishing malicious files from clean ones. KeywordsStatic Malware Analysis; Machine Learning; Classification;

[1]  Divya Bansal,et al.  Classification of PE Files using Static Analysis , 2014, SIN.

[2]  Md. Rafiqul Islam,et al.  An automated classification system based on the strings of trojan and virus families , 2009, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE).

[3]  Kangbin Yim,et al.  Malware Obfuscation Techniques: A Brief Survey , 2010, 2010 International Conference on Broadband, Wireless Computing, Communication and Applications.

[4]  Yuval Elovici,et al.  Unknown malcode detection via text categorization and the imbalance problem , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[5]  Jianmin Pang,et al.  Using API Sequence and Bayes Algorithm to Detect Suspicious Behavior , 2009, 2009 International Conference on Communication Software and Networks.

[6]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[7]  Guanhua Yan,et al.  Discriminant malware distance learning on structural information for automated malware classification , 2013, SIGMETRICS.

[8]  Igor Santos,et al.  Countering entropy measure attacks on packed software detection , 2012, 2012 IEEE Consumer Communications and Networking Conference (CCNC).

[9]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[11]  Robert Lyda,et al.  Using Entropy Analysis to Find Encrypted and Packed Malware , 2007, IEEE Security & Privacy.

[12]  Divya Bansal,et al.  Malware Analysis and Classification: A Survey , 2014 .

[13]  Yan Zhou,et al.  Malware detection using adaptive data compression , 2008, AISec '08.

[14]  Yong Chen,et al.  Automatic malware categorization using cluster ensemble , 2010, KDD.

[15]  B. S. Manjunath,et al.  Malware images: visualization and automatic classification , 2011, VizSec '11.

[16]  Divya Bansal,et al.  Integrated Framework for Classification of Malwares , 2014, SIN.

[17]  Marc Dacier,et al.  A framework for attack patterns' discovery in honeynet data , 2008 .

[18]  A.H. Sung,et al.  Polymorphic malicious executable scanner by API sequence analysis , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[19]  Mian Zhou,et al.  A heuristic approach for detection of obfuscated malware , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.