Malware detection based on mining API calls

Financial loss due to malware nearly doubles every two years. For instance in 2006, malware caused near 33.5 Million GBP direct financial losses only to member organizations of banks in UK. Recent malware cannot be detected by traditional signature based anti-malware tools due to their polymorphic and/or metamorphic nature. Malware detection based on its immutable characteristics has been a recent industrial practice. The datasets are not public. Thus the results are not reproducible and conducting research in academic setting is difficult. In this work, we not only have improved a recent method of malware detection based on mining Application Programming Interface (API) calls significantly, but also have created the first public dataset to promote malware research. Our technique first reads API call sets used in a collection of Portable Executable (PE) files, then generates a set of discriminative and domain interpretable features. These features are then used to train a classifier to detect unseen malware. We have achieved detection rate of 99.7% while keeping accuracy as high as 98.3%. Our method improved state of the art technology in several aspects: accuracy by 5.24%, detection rate by 2.51% and false alarm rate was decreased from 19.86% to 1.51%. This project's data and source code can be found at http://home.shirazu.ac.ir/~sami/malware.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Sattar Hashemi,et al.  Detecting intrusion transactions in databases using data item dependencies and anomaly analysis , 2008, Expert Syst. J. Knowl. Eng..

[5]  Matt Pietrek,et al.  An in-depth look into the win32 portable executable le format , 2002 .

[6]  Matt Pietrek,et al.  Peering Inside the PE: A Tour of the Win32 Portable Executable File Format , 1994 .

[7]  Tao Li,et al.  An intelligent PE-malware detection system based on association mining , 2008, Journal in Computer Virology.

[8]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[9]  Jesse C. Rabek,et al.  Detection of injected, dynamically generated, and obfuscated malicious code , 2003, WORM '03.

[10]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[11]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[12]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[13]  Andrew H. Sung,et al.  Static analyzer of vicious executables (SAVE) , 2004, 20th Annual Computer Security Applications Conference.

[14]  Jiawei Han,et al.  Classification of software behaviors for failure detection: a discriminative pattern mining approach , 2009, KDD.

[15]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[16]  Ian Witten,et al.  Data Mining , 2000 .

[17]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jau-Hwang Wang,et al.  Virus detection using data mining techinques , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[19]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[20]  Lilly Suriani Affendey,et al.  Intrusion detection using data mining techniques , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[21]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[22]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).