论文信息 - Using Machine Learning Techniques to Identify Botnet Traffic

Using Machine Learning Techniques to Identify Botnet Traffic

To date, techniques to counter cyber-attacks have predominantly been reactive; they focus on monitoring network traffic, detecting anomalies and cyber-attack traffic patterns, and, a posteriori, combating the cyber-attacks and mitigating their effects. Contrary to such approaches, we advocate proactively detecting and identifying botnets prior to their being used as part of a cyber-attack (Strayer et al., 2006). In this paper, we present our work on using machine learning-based classification techniques to identify the command and control (C2) traffic of IRC-based botnets - compromised hosts that are collectively commanded using Internet relay chat (IRC). We split this task into two stages: (I) distinguishing between IRC and non-IRC traffic, and (II) distinguishing between botnet and real IRC traffic. For stage I, we compare the performance of J48, naive Bayes, and Bayesian network classifiers, identify the features that achieve good overall classification accuracy, and determine the classification sensitivity to the training set size. While sensitive to the training data and the attributes used to characterize communication flows, machine learning-based classifiers show promise in identifying IRC traffic. Using classification in stage II is trickier, since accurately labeling IRC traffic as botnet and non-botnet is challenging. We are currently exploring labeling flows as suspicious and non-suspicious based on telltales of hosts being compromised

[1] Andrew W. Moore,et al. Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[2] Matthew Roughan,et al. Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[3] Tristan Henderson,et al. The changing usage of a mature campus-wide wireless network , 2004, MobiCom '04.

[4] Cyrus Peikari,et al. Security Warrior , 2004 .

[5] Bill McCarty,et al. Automated Identity Theft , 2003, IEEE Secur. Priv..

[6] Anja Feldmann,et al. An analysis of Internet chat systems , 2003, IMC '03.

[7] W. Timothy Strayer,et al. Detecting Botnets with Tight Command and Control , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[8] Elias Levy. The Making of a Spam Zombie Army: Dissecting the Sobig Worms , 2003, IEEE Secur. Priv..

[9] David G. Stork,et al. Pattern Classification , 1973 .

[10] LevyElias. The Making of a Spam Zombie Army , 2003, S&P 2003.

[11] Oliver Spatscheck,et al. Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[12] Bill McCarty,et al. Botnets: Big and Bigger , 2003, IEEE Secur. Priv..

[13] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.

[14] Thorsten Holz. A Short Visit to the Bot Zoo , 2005, IEEE Secur. Priv..

[15] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .