Veto-based Malware Detection

Malicious software (malware) represents a threat to the security and privacy of computer users. Traditional signature-based and heuristic-based methods are unsuccessful in detecting some forms of malware. This paper presents a malware detection approach based on supervised learning. The main contributions of the paper are an ensemble learning algorithm, two pre-processing techniques, and an empirical evaluation of the proposed algorithm. Sequences of operational codes are extracted as features from malware and benign files. These sequences are used to produce three different data sets with different configurations. A set of learning algorithms is evaluated on the data sets and the predictions are combined by the ensemble algorithm. The predicted output is decided on the basis of veto voting. The experimental results show that the approach can accurately detect both novel and known malware instances with higher recall in comparison to majority voting.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Salvatore J. Stolfo,et al.  Data mining methods for detection of new malicious executables , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Ludmila I. Kuncheva Diversity in multiple classifier systems , 2005, Inf. Fusion.

[5]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[6]  Yuval Elovici,et al.  Unknown Malcode Detection Using OPCODE Representation , 2008, EuroISI.

[7]  Niklas Lavesson,et al.  Detection of Spyware by Mining Executable Files , 2010, 2010 International Conference on Availability, Reliability and Security.

[8]  William Stallings Zhu,et al.  Network Security Essentials : Applications and Standards , 2007 .

[9]  Ian Witten,et al.  Data Mining , 2000 .

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Roman Kern,et al.  Vote/Veto Meta-Classifier for Authorship Identification - Notebook for PAN at CLEF 2011 , 2011, CLEF.

[13]  Alexander Gepperth Object Detection and Feature Base Learning with Sparse Convolutional Neural Networks , 2006, ANNPR.

[14]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[15]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[16]  Fabio Roli,et al.  Methods for dynamic classifier selection , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[17]  Ferenc Szidarovszky,et al.  Voting with a parameterized veto strategy: solving the KDD Cup 2006 problem by means of a classifier committee , 2006, SKDD.

[18]  Somesh Jha,et al.  A Layered Architecture for Detecting Malicious Behaviors , 2008, RAID.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  William Stallings,et al.  Network Security Essentials: Applications and Standards , 1999 .

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[23]  Hervé Moulin,et al.  Voting with Proportional Veto Power , 1982 .

[24]  Fuad Rahman,et al.  Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations , 2002, Document Analysis Systems.

[25]  Anne M. P. Canuto,et al.  A Dynamic Classifier Selection Method to Build Ensembles using Accuracy and Diversity , 2006, 2006 Ninth Brazilian Symposium on Neural Networks (SBRN'06).

[26]  Niklas Lavesson,et al.  Detecting scareware by mining variable length instruction sequences , 2011, 2011 Information Security for South Africa.