A Study on the Effect of Feature Selection on Malware Analysis using Machine Learning

In this paper, the effect of feature selection in malware detection using machine learning techniques is studied. We employ supervised and unsupervised machine learning algorithms with and without feature selection. These include both classification and clustering algorithms. The algorithms are compared for effectiveness and efficiency using their predictive accuracy, among others, as performance metric. From the studies, we observe that the best detection rate was attained for supervised learning with feature selection. The supervised learning algorithm used was Multilayer Perceptron (MLP) algorithm. The analysis also reveals that our system can detect viruses from varying sources.

[1]  M. Shardlow An Analysis of Feature Selection Techniques , 2011 .

[2]  Nizar Kheir,et al.  Behavioral classification and detection of malware through HTTP user agent anomalies , 2013, J. Inf. Secur. Appl..

[3]  JuiHsi Fu,et al.  Detecting spamming activities in a campus network using incremental learning , 2014, J. Netw. Comput. Appl..

[4]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[5]  Taiwo Oladipupo Ayodele,et al.  Types of Machine Learning Algorithms , 2010 .

[6]  Ping Wang,et al.  Malware behavioural detection and vaccine development by using a support vector model classifier , 2015, J. Comput. Syst. Sci..

[7]  Zheng Yan,et al.  A hybrid approach of mobile malware detection in Android , 2017, J. Parallel Distributed Comput..

[8]  Broderick Crawford,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2007 .

[9]  Bart Baesens,et al.  Domain knowledge integration in data mining using decision tables: case studies in churn prediction , 2009, J. Oper. Res. Soc..

[10]  Tomás Pevný,et al.  Reducing false positives of network anomaly detection by local adaptive multivariate smoothing , 2017, J. Comput. Syst. Sci..

[11]  Rosario Girardi,et al.  A hybrid and learning agent architecture for network intrusion detection , 2017, J. Syst. Softw..

[12]  Amit Vasudevan,et al.  SPiKE: engineering malware analysis tools using unobtrusive binary-instrumentation , 2006, ACSC.

[13]  M. Wahlgren,et al.  Do surface active parenteral formulations cause inflammation? , 2015, International journal of pharmaceutics.

[14]  Andrew Honig,et al.  Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software , 2012 .

[15]  Md. Rafiqul Islam,et al.  Hybrids of support vector machine wrapper and filter based framework for malware detection , 2016, Future Gener. Comput. Syst..

[16]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[17]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[18]  Lior Rokach,et al.  SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods , 2016, Expert Syst. Appl..

[19]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[20]  Yoseba K. Penya,et al.  N-grams-based File Signatures for Malware Detection , 2009, ICEIS.

[21]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[22]  Christopher Krügel,et al.  Limits of Static Analysis for Malware Detection , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Dragos Gavrilut,et al.  Malware detection using machine learning , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[25]  Gary McGraw,et al.  Attacking Malicious Code: A Report to the Infosec Research Council , 2000, IEEE Software.

[26]  Maya R. Gupta,et al.  Theory and Use of the EM Algorithm , 2011, Found. Trends Signal Process..

[27]  Md. Rafiqul Islam,et al.  Classification of malware based on integrated static and dynamic features , 2013, J. Netw. Comput. Appl..

[28]  Nor Badrul Anuar,et al.  The rise of "malware": Bibliometric analysis of malware study , 2016, J. Netw. Comput. Appl..

[29]  Sotiris B. Kotsiantis,et al.  Logitboost of Simple Bayesian Classifier , 2005, Informatica.

[30]  Christopher Krügel,et al.  Scalable, Behavior-Based Malware Clustering , 2009, NDSS.

[31]  Md. Rafiqul Islam,et al.  A multi-tier phishing detection and filtering approach , 2013, J. Netw. Comput. Appl..

[32]  D. Baum,et al.  Differential regulation of symmetry genes and the evolution of floral morphologies , 2003, Proceedings of the National Academy of Sciences of the United States of America.