Clustering botnet communication traffic based on n-gram feature selection

Recognized as one the most serious security threats on current Internet infrastructure, botnets can not only be implemented by existing well known applications, e.g. IRC, HTTP, or Peer-to-Peer, but also can be constructed by unknown or creative applications, which makes the botnet detection a challenging problem. Previous attempts for detecting botnets are mostly to examine traffic content for bot command on selected network links or by setting up honeypots. Traffic content, however, can be encrypted with the evolution of botnet, and as a result leading to a fail of content based detection approaches. In this paper, we address this issue and propose a new approach for detecting and clustering botnet traffic on large-scale network application communities, in which we first classify the network traffic into different applications by using traffic payload signatures, and then a novel decision tree model is used to classify those traffic to be unknown by the payload content (e.g. encrypted traffic) into known application communities where network traffic is clustered based on n-gram features selected and extracted from the content of network flows in order to differentiate the malicious botnet traffic created by bots from normal traffic generated by human beings on each specific application. We evaluate our approach with seven different traffic trace collected on three different network links and results show the proposed approach successfully detects two IRC botnet traffic traces with a high detection rate and an acceptable low false alarm rate.

[1]  Neil Daswani,et al.  The Anatomy of Clickbot.A , 2007, HotBots.

[2]  R. Villamarin-Salomon,et al.  Identifying Botnets Using Anomaly Detection Techniques Applied to DNS Traffic , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[3]  Bhavani Thuraisingham,et al.  Peer to peer botnet detection for cyber-security: a data mining approach , 2008, CSIIRW '08.

[4]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[5]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[6]  José Carlos Brustoloni,et al.  Bayesian bot detection based on DNS traffic similarity , 2009, SAC '09.

[7]  Suresh Singh,et al.  An Algorithm for Anomaly-based Botnet Detection , 2006, SRUTI.

[8]  Guofei Gu,et al.  BotSniffer: Detecting Botnet Command and Control Channels in Network Traffic , 2008, NDSS.

[9]  Mitsuaki Akiyama,et al.  A Proposal of Metrics for Botnet Detection Based on Its Cooperative Behavior , 2007, 2007 International Symposium on Applications and the Internet Workshops.

[10]  Heejo Lee,et al.  Botnet Detection by Monitoring Group Activities in DNS Traffic , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[11]  Thorsten Holz,et al.  Rishi: Identify Bot Contaminated Hosts by IRC Nickname Evaluation , 2007, HotBots.

[12]  Ram Dantu,et al.  Email Shape Analysis for Spam Botnet Detection , 2009, 2009 6th IEEE Consumer Communications and Networking Conference.

[13]  Alex Brodsky,et al.  A Distributed Content Independent Method for Spam Detection , 2007, HotBots.

[14]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2002, IMW '02.

[15]  Wenke Lee,et al.  Botnet Detection: Countering the Largest Security Threat , 2010, Botnet Detection.

[16]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[17]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[18]  Vinod Yegneswaran,et al.  BotHunter: Detecting Malware Infection Through IDS-Driven Dialog Correlation , 2007, USENIX Security Symposium.

[19]  Kouichi Sakurai,et al.  Bot Detection Based on Traffic Analysis , 2007 .

[20]  Felix C. Freiling,et al.  Measurements and Mitigation of Peer-to-Peer-based Botnets: A Case Study on Storm Worm , 2008, LEET.

[21]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[22]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[23]  Luca Salgarelli,et al.  Comparing traffic classifiers , 2007, CCRV.

[24]  Yao Zhao,et al.  BotGraph: Large Scale Spamming Botnet Detection , 2009, NSDI.

[25]  W. Timothy Strayer,et al.  Using Machine Learning Techniques to Identify Botnet Traffic , 2006 .

[26]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[27]  Ken Chiang,et al.  A Case Study of the Rustock Rootkit and Spam Bot , 2007, HotBots.

[28]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[31]  Zhaoxin Zhang,et al.  A Novel Approach to Detect IRC-Based Botnets , 2009, 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing.

[32]  John C. Mitchell,et al.  Towards Systematic Evaluation of the Evadability of Bot/Botnet Detection Methods , 2008, WOOT.