Incremental Learning for Malware Classification in Small Datasets

Information security is an important research area. As a very special yet important case, malware classification plays an important role in information security. In the real world, the malware datasets are open-ended and dynamic, and new malware samples belonging to old classes and new classes are increasing continuously. This requires the malware classification method to enable incremental learning, which can efficiently learn the new knowledge. However, existing works mainly focus on feature engineering with machine learning as a tool. To solve the problem, we present an incremental malware classification framework, named “IMC,” which consists of opcode sequence extraction, selection, and incremental learning method. We develop an incremental learning method based on multiclass support vector machine (SVM) as the core component of IMC, named “IMCSVM,” which can incrementally improve its classification ability by learning new malware samples. In IMC, IMCSVM adds the new classification planes (if new samples belong to a new class) and updates all old classification planes for new malware samples. As a result, IMC can improve the classification quality of known malware classes by minimizing the prediction error and transfer the old model with known knowledge to classify unknown malware classes. We apply the incremental learning method into malware classification, and the experimental results demonstrate the advantages and effectiveness of IMC.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[3]  Guanghui Liang,et al.  Automatic Benchmark Generation Framework for Malware Detection , 2018, Secur. Commun. Networks.

[4]  Shuangquan Wang,et al.  A Class Incremental Extreme Learning Machine for Activity Recognition , 2014, Cognitive Computation.

[5]  P. Vinod,et al.  Heterogeneous Opcode Space for Metamorphic Malware Detection , 2017 .

[6]  Pichao Wang,et al.  Online human action recognition based on incremental learning of weighted covariance descriptors , 2018, Inf. Sci..

[7]  Naiqi Wu,et al.  SVM-DT-based adaptive and collaborative intrusion detection , 2018, IEEE/CAA Journal of Automatica Sinica.

[8]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[9]  Ashkan Sami,et al.  MAAR: Robust features to detect malicious activity based on API calls, their arguments and return values , 2017, Eng. Appl. Artif. Intell..

[10]  Ding Yuxin,et al.  Malware detection based on deep learning algorithm , 2017, Neural Computing and Applications.

[11]  Hamid Beigy,et al.  Incremental RotBoost algorithm: An application for spam filtering , 2015, Intell. Data Anal..

[12]  Irfan-Ullah Awan,et al.  CloudIntell: An intelligent malware detection system , 2017, Future Gener. Comput. Syst..

[13]  Mamoun Alazab,et al.  Profiling and classifying the behavior of malicious codes , 2015, J. Syst. Softw..

[14]  Tankut Acarman,et al.  Malware classification based on API calls and behaviour analysis , 2017, IET Inf. Secur..

[15]  Igor Santos,et al.  Opcode sequences as representation of executables for data-mining-based unknown malware detection , 2013, Inf. Sci..

[16]  Lior Rokach,et al.  Novel active learning methods for enhanced PC malware detection in windows OS , 2014, Expert Syst. Appl..

[17]  Fakhroddin Noorbehbahani,et al.  An incremental intrusion detection system using a new semi‐supervised stream classification method , 2017, Int. J. Commun. Syst..

[18]  Pei-Chann Chang,et al.  A population-based incremental learning approach with artificial immune system for network intrusion detection , 2016, Eng. Appl. Artif. Intell..

[19]  Matthieu Guillaumin,et al.  Incremental Learning of Random Forests for Large-Scale Image Classification , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Kieran McLaughlin,et al.  SVM Training Phase Reduction Using Dataset Feature Filtering for Malware Detection , 2013, IEEE Transactions on Information Forensics and Security.

[21]  Qiao Tian,et al.  Homology analysis of malware based on ensemble learning and multifeatures , 2019, PloS one.

[22]  Junfeng Wang,et al.  Improving malware detection using multi-view ensemble learning , 2016, Secur. Commun. Networks.

[23]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[24]  Wei Xu,et al.  Incremental SVM based on reserved set for network intrusion detection , 2011, Expert Syst. Appl..

[25]  Lior Rokach,et al.  Detecting unknown computer worm activity via support vector machines and active learning , 2012, Pattern Analysis and Applications.

[26]  Naoto Kawaguchi,et al.  Malware Function Estimation Using API in Initial Behavior , 2017, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[27]  Ichiro Takeuchi,et al.  Multiple Incremental Decremental Learning of Support Vector Machines , 2009, IEEE Transactions on Neural Networks.

[28]  Jingmei Li,et al.  Malware Classification Using Probability Scoring and Machine Learning , 2019, IEEE Access.

[29]  Yanfang Ye,et al.  Malicious sequential pattern mining for automatic malware detection , 2016, Expert Syst. Appl..

[30]  Igor Santos,et al.  Using opcode sequences in single-class learning to detect unknown malware , 2011, IET Inf. Secur..