Behavioural detection with API call-grams to identify malicious PE files

Present day malware shows stealthy and dynamic capability to avail administrative rights and control the victim computer [10]. Malware writers depend on evasion techniques like code obfuscation, packing, compression, encryption or polymorphism to avoid detection by Anti-Virus (AV) scanners as AV primarily use signature based detection. According to the FireEye Threat report second half of 2011 [15], top 50 malware have generated 80% infections. Malware like Zues, Conficker, Koobface have become more stealthy by use of pay per install toolkits like Blackhole [15]. Pay per install toolkits make the samples dynamic in nature. This has led to exponential increase of unknown, zero-day malware [14]. To complement the signatured approach, a good behavioral scheme is imminent due to exponential increase in number of encoded malware samples. Behavioural analysis can detect unknown, encrypted, zero day malware, but these methods result in increased false alarm rate. We propose a behaviour model that represents abstraction of a binary by analyzing the Application Programming Interface (API) strings made by Windows Portable Executable (PE) [25] files. Our focus is based on extracting temporal snapshots of malware and benign executables known as API Call-grams, as API strings are primarily written for software development kits to generate sane code. Malcode writers misues the available functionality to keep the code compact and escape being detected by AV software.

[1]  Christopher Krügel,et al.  BareBox: efficient malware analysis on bare-metal , 2011, ACSAC '11.

[2]  Yuval Elovici,et al.  Unknown malcode detection via text categorization and the imbalance problem , 2008, 2008 IEEE International Conference on Intelligence and Security Informatics.

[3]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[4]  Vijay Laxmi,et al.  MEDUSA: MEtamorphic malware dynamic analysis usingsignature from API , 2010, SIN.

[5]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[6]  Mansour Ahmadi,et al.  Semantic Malware Detection by Deploying Graph Mining , 2012 .

[7]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[8]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[9]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.

[10]  Kouichi Sakurai,et al.  A behavior based malware detection scheme for avoiding false positive , 2010, 2010 6th IEEE Workshop on Secure Network Protocols.

[11]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[12]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[13]  Radu State,et al.  Malware behaviour analysis , 2008, Journal in Computer Virology.

[14]  Vijay Laxmi,et al.  PEAL - Packed Executable AnaLysis , 2011, ADCONS.

[15]  Muhammad Zubair Shafiq,et al.  Using spatio-temporal information in API calls with machine learning algorithms for malware detection , 2009, AISec '09.

[16]  Guillaume Bonfante,et al.  Architecture of a morphological malware detector , 2009, Journal in Computer Virology.

[17]  Md. Rafiqul Islam,et al.  Differentiating malware from cleanware using behavioural analysis , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Sattar Hashemi,et al.  Malware detection based on mining API calls , 2010, SAC '10.