An unknown malware detection scheme based on the features of graph

The traditional malware detection schemes based on specific signature give an unsatisfactory performance as disposing the previously unknown malware, so the general features of binary files should be explored to solve this problem. Recently, classification algorithms were employed successfully to choose the features in unknown malicious code, and most of the works use byte or operation code sequence n-gram representation of the executables. However, these n-gram representations are heavily dependent on the training data. In this paper, we present a graph-based method to detect unknown malware. The function call graph of an executable, which includes the functions and the call relations between them, is selected as the representation of the executable in this method. The features are defined according to both the statistical information and the topology of the function call graph. They are extracted and processed through machine learning to classify unknown Portable Executable files. For the sake of fixed sum of the features, the graph-based method can avoid so many features found in other methods. In our experiments, three types of malware datasets were tested, and as high as 96.8% accuracy can be achieved. Furthermore, it can achieve 92.1% accuracy when only 5% of the dataset is served as training set. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  A. Ibrahim Data Mining Methods For Malware Detection Using Instruction Sequences , 2015 .

[5]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[6]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[7]  Alexander Chatzigeorgiou,et al.  Application of graph theory to OO software engineering , 2006, WISER '06.

[8]  Joohan Lee,et al.  Data mining methods for malware detection using instruction sequences , 2008 .

[9]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Jianping Yin,et al.  Malicious Codes Detection Based on Ensemble Learning , 2007, ATC.

[11]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[12]  K. Zou,et al.  Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. , 1997, Statistics in medicine.

[13]  Jau-Hwang Wang,et al.  Intelligent automatic malicious code signatures extraction , 2003, IEEE 37th Annual 2003 International Carnahan Conference onSecurity Technology, 2003. Proceedings..

[14]  Muhammad Zubair Shafiq,et al.  PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime , 2009, RAID.

[15]  Joohan Lee,et al.  A survey of data mining techniques for malware detection using file features , 2008, ACM-SE 46.

[16]  Lior Rokach,et al.  Improving malware detection by applying multi-inducer ensemble , 2009, Comput. Stat. Data Anal..