Flow based botnet detection through semi-supervised active learning

In a variety of Network-based Intrusion Detection System (NIDS) applications, one desires to detect groups of unknown attack (e.g., botnet) packet-flows, with a group potentially manifesting its atypicality (relative to a known reference “normal”/null model) on a low-dimensional subset of the full measured set of features used by the IDS. What makes this anomaly detection problem quite challenging is that it is a priori unknown which (possibly sparse) subset of features jointly characterizes a particular application, especially one that has not been seen before, which thus represents an unknown behavioral class (zero-day threat). Moreover, nowadays botnets have become evasive, evolving their behavior to avoid signature-based IDSes. In this work, we apply a novel active learning (AL) framework for botnet detection, facilitating detection of unknown botnets (assuming no ground truth examples of same). We propose a new anomaly-based feature set that captures the informative features and exploits the sequence of packet directions in a given flow. Experiments on real world network traffic data, including several common Zeus botnet instances, demonstrate the advantage of our proposed features and AL system.

[1]  Jingrui He,et al.  Rare category analysis , 2010 .

[2]  Shaogang Gong,et al.  A Unifying Theory of Active Discovery and Learning , 2012, ECCV.

[3]  Jens Myrup Pedersen,et al.  Machine learning for identifying botnet network traffic , 2013 .

[4]  David J. Miller,et al.  Actively learning to distinguish suspicious from innocuous anomalies in a batch of vehicle tracks , 2014, Defense + Security Symposium.

[5]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[6]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[7]  George Kesidis,et al.  A Flow Classifier with Tamper-Resistant Features and an Evaluation of Its Portability to New Domains , 2011, IEEE Journal on Selected Areas in Communications.

[8]  Guofei Gu,et al.  BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection , 2008, USENIX Security Symposium.

[9]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[10]  George Kesidis,et al.  Salting Public Traces with Attack Traffic to Test Flow Classifiers , 2011, CSET.

[11]  Ali A. Ghorbani,et al.  Detecting P2P botnets through network behavior analysis and machine learning , 2011, 2011 Ninth Annual International Conference on Privacy, Security and Trust.

[12]  George Kesidis,et al.  Detecting clusters of anomalies on low-dimensional feature subsets with application to network traffic flow data , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[13]  Ling Huang,et al.  Adversarial Active Learning , 2014, AISec '14.

[14]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[15]  Ronaldo M. Salles,et al.  Botnets: A survey , 2013, Comput. Networks.

[16]  John C. Mitchell,et al.  Towards Systematic Evaluation of the Evadability of Bot/Botnet Detection Methods , 2008, WOOT.

[17]  George Kesidis,et al.  A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Marco Canini,et al.  Efficient application identification and the temporal and spatial stability of classification schema , 2009, Comput. Networks.

[19]  A. Nur Zincir-Heywood,et al.  On the Effectiveness of Different Botnet Detection Approaches , 2015, ISPEC.

[20]  Gabi Nakibly,et al.  ACTIDS: an active strategy for detecting and localizing network attacks , 2013, AISec.

[21]  George Kesidis,et al.  Detecting anomalous latent classes in a batch of network traffic flows , 2014, 2014 48th Annual Conference on Information Sciences and Systems (CISS).

[22]  Jianping Yin,et al.  Sampling Attack against Active Learning in Adversarial Environment , 2012, MDAI.

[23]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.