FENOC: An Ensemble One-Class Learning Framework for Malware Detection

Nowadays, machine learning based methods are among the most popular ones for malware detection. However, most of the previous works use a single type of features, dynamic or static, and take them to build a binary classification model. These methods have limited ability to depict characteristic malware behaviors and suffer from insufficiently sampled benign samples and extremely imbalanced training dataset. In this paper, we present FENOC, an ensemble one-class learning framework for malware detection. FENOC uses hybrid features from multiple semantic layers to ensure comprehensive insights of analyzed programs, and constructs detection model via CosTOC (Cost-sensitive Twin One-class Classifier), a novel one-class learning algorithm, which uses a pair of one-class classifiers to describe malware class and benign program class respectively. CosTOC is more flexible and robust when handling malware detection problems, which is imbalanced and need low false positive rate. Meanwhile, a random subspace ensemble method is used to enhance the generalization ability of CosTOC. Experimental results show that to detect unknown malware, FENOC has a higher detection rate and a lower false positive rate, especially in the situations that training datasets are imbalanced.

[1]  Lior Rokach,et al.  Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features , 2012, J. Mach. Learn. Res..

[2]  Carsten Willems,et al.  Automatic analysis of malware behavior using machine learning , 2011, J. Comput. Secur..

[3]  Christopher Krügel,et al.  Improving the efficiency of dynamic malware analysis , 2010, SAC '10.

[4]  Wenke Lee,et al.  Classification of packed executables for accurate computer virus detection , 2008, Pattern Recognit. Lett..

[5]  Shifei Ding,et al.  An overview on twin support vector machines , 2012, Artificial Intelligence Review.

[6]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[7]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[8]  Sattar Hashemi,et al.  Metamorphic Malware Detection using Control Flow Graph Mining , 2011 .

[9]  Latifur Khan,et al.  Data Mining Tools for Malware Detection , 2011 .

[10]  Li Dong,et al.  Feature representation and selection in malicious code detection methods based on static system calls , 2011, Comput. Secur..

[11]  Rui Yang,et al.  Detecting Malware Variants by Byte Frequency , 2011, J. Networks.

[12]  Andrew Walenstein,et al.  VILO: a rapid learning nearest-neighbor classifier for malware triage , 2013, Journal of Computer Virology and Hacking Techniques.

[13]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[14]  Christopher Krügel,et al.  Dynamic Analysis of Malicious Code , 2006, Journal in Computer Virology.

[15]  Weisheng Li,et al.  Osiris: A Malware Behavior Capturing System Implemented at Virtual Machine Monitor Layer , 2012, 2012 Eighth International Conference on Computational Intelligence and Security.

[16]  Qiguang Miao,et al.  Abstracting minimal security-relevant behaviors for malware analysis , 2013, Journal of Computer Virology and Hacking Techniques.