Analyzing string format-based classifiers for botnet detection: GP and SVM

The domain name system (DNS) is an essential component of Internet. As it is expected to be used by all legitimate users and applications, generally there are less inspections, restrictions and filters on it. Botnets rely on this open component to accomplish their malicious operation. Therefore, to defeat the single point of failure and evade static blacklists and firewalls, they employ DNS-based methods to frequently generate new automatic domain names. Stateful-SBB, which is a form of genetic programming (GP), was previously designed and developed by the authors to detect these automatically generated domain names based on minimum a priori information which was shown efficient. In this paper, we compare Stateful-SBB against the String Subsequence Kernel (SSK) and SSK with Lambda Pruning (SSK-LP), which are based on support vector machines (SVM) and also use string format inputs. Analyzing the domain names that each of the classifiers chooses as a part of their solutions in the classification process, we notice that 50% to 63% of the Stateful-SBBs' frequently selected points on the Pareto-front are also used by SSK and SSK-LP, respectively. By analyzing these common domain names, we identify some of the characteristics of the botnet domain names. Moreover, we introduce a pruned version of the Stateful-SBB that resulted in reducing the solution complexity by 83% with the same high accuracy.

[1]  Ashwath Kumar Krishna Reddy Detecting Networks Employing Algorithmically Generated Domain Names , 2011 .

[2]  Malcolm I. Heywood,et al.  Coevolutionary bid-based genetic programming for problem decomposition in classification , 2008, Genetic Programming and Evolvable Machines.

[3]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[4]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[5]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[6]  Alexander K. Seewald,et al.  Lambda pruning: an approximation of the string subsequence kernel for practical SVM classification and redundancy clustering , 2007, Adv. Data Anal. Classif..

[7]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[8]  W. Timothy Strayer,et al.  Botnet Detection Based on Network Behavior , 2008, Botnet Detection.

[9]  Rasmus Ulslev Pedersen,et al.  An Embedded Support Vector Machine , 2006, 2006 International Workshop on Intelligent Solutions in Embedded Systems.

[10]  Sureswaran Ramadass,et al.  A Survey of Botnet and Botnet Detection , 2009, 2009 Third International Conference on Emerging Security Information, Systems and Technologies.

[11]  Sandeep Yadav,et al.  Detecting Algorithmically Generated Domain-Flux Attacks With DNS Traffic Analysis , 2012, IEEE/ACM Transactions on Networking.

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Wenke Lee,et al.  Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces , 2009, 2009 Annual Computer Security Applications Conference.

[14]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..

[15]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[16]  Felix C. Freiling,et al.  On Botnets That Use DNS for Command and Control , 2011, 2011 Seventh European Conference on Computer Network Defense.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[19]  Malcolm I. Heywood,et al.  Malicious Automatically Generated Domain Name Detection Using Stateful-SBB , 2013, EvoApplications.

[20]  Sureswaran Ramadass,et al.  Detecting Botnet Activities Based on Abnormal DNS traffic , 2009, ArXiv.

[21]  Andrew R. McIntyre,et al.  Symbiotic coevolutionary genetic programming: a benchmarking study under large attribute spaces , 2012, Genetic Programming and Evolvable Machines.

[22]  Etienne Stalmans,et al.  A framework for DNS based detection and mitigation of malware infections on a network , 2011, 2011 Information Security for South Africa.

[23]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[24]  Hassen Saïdi,et al.  A Foray into Conficker's Logic and Rendezvous Points , 2009, LEET.