Security Hardening of Botnet Detectors Using Generative Adversarial Networks

Machine learning (ML) based botnet detectors are no exception to traditional ML models when it comes to adversarial evasion attacks. The datasets used to train these models have also scarcity and imbalance issues. We propose a new technique named Botshot, based on generative adversarial networks (GANs) for addressing these issues and proactively making botnet detectors aware of adversarial evasions. Botshot is cost-effective as compared to the network emulation for botnet traffic data generation rendering the dedicated hardware resources unnecessary. First, we use the extended set of network flow and time-based features for three publicly available botnet datasets. Second, we utilize two GANs (vanilla, conditional) for generating realistic botnet traffic. We evaluate the generator performance using classifier two-sample test (C2ST) with 10-fold 70-30 train-test split and propose the use of ’recall’ in contrast to ’accuracy’ for proactively learning adversarial evasions. We then augment the train set with the generated data and test using the unchanged test set. Last, we compare our results with benchmark oversampling methods with augmentation of additional botnet traffic data in terms of average accuracy, precision, recall and F1 score over six different ML classifiers. The empirical results demonstrate the effectiveness of the GAN-based oversampling for learning in advance the adversarial evasion attacks on botnet detectors.

[1]  Thomas Hofmann,et al.  The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels , 2017, ArXiv.

[2]  Mahdi Aiash,et al.  Machine Learning Based Botnet Identification Traffic , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[3]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[4]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[5]  Ali A. Ghorbani,et al.  Detecting P2P botnets through network behavior analysis and machine learning , 2011, 2011 Ninth Annual International Conference on Privacy, Security and Trust.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Ali A. Ghorbani,et al.  Towards effective feature selection in machine learning-based botnet detection approaches , 2014, 2014 IEEE Conference on Communications and Network Security.

[8]  Ali A. Ghorbani,et al.  Characterization of Tor Traffic using Time based Features , 2017, ICISSP.

[9]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[10]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[11]  Robert C. Atkinson,et al.  A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems , 2020, IEEE Access.

[12]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[13]  Stefan Lessmann,et al.  Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning , 2020, Expert Syst. Appl..

[14]  André C. Drummond,et al.  A Survey of Random Forest Based Methods for Intrusion Detection Systems , 2018, ACM Comput. Surv..

[15]  Rupam Kumar Sharma,et al.  Are machine learning based intrusion detection system always secure? An insight into tampered learning , 2018, J. Intell. Fuzzy Syst..

[16]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[17]  Joshua Ojo Nehinbe,et al.  A critical evaluation of datasets for investigating IDSs and IPSs researches , 2011, 2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS).

[18]  Paul D. Yoo,et al.  From Intrusion Detection to Attacker Attribution: A Comprehensive Survey of Unsupervised Methods , 2018, IEEE Communications Surveys & Tutorials.

[19]  Ali A. Ghorbani,et al.  Botnet detection based on traffic behavior analysis and flow intervals , 2013, Comput. Secur..

[20]  R. Venkatesh Babu,et al.  NAG: Network for Adversary Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Hung Ba Improving Detection of Credit Card Fraudulent Transactions using Generative Adversarial Networks , 2019, ArXiv.

[22]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[23]  Hyrum S. Anderson,et al.  DeepDGA: Adversarially-Tuned Domain Generation and Detection , 2016, AISec@CCS.

[24]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[26]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[27]  Adnan Noor Mian,et al.  Energy efficient cross-layer approach for object security of CoAP for IoT devices , 2019, Ad Hoc Networks.

[28]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[29]  Nauman Aslam,et al.  An efficient reinforcement learning-based Botnet detection approach , 2020, J. Netw. Comput. Appl..

[30]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[31]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[32]  Jana Uramová,et al.  Infrastructure for Generating New IDS Dataset , 2018, 2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA).

[33]  Tiago Cruz,et al.  Adversarial Machine Learning Applied to Intrusion and Malware Scenarios: A Systematic Review , 2020, IEEE Access.

[34]  Michele Colajanni,et al.  Evading Botnet Detectors Based on Flows and Random Forest with Adversarial Samples , 2018, 2018 IEEE 17th International Symposium on Network Computing and Applications (NCA).

[35]  Siddique Latif,et al.  Generative Adversarial Networks For Launching and Thwarting Adversarial Attacks on Network Intrusion Detection Systems , 2019, 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC).

[36]  Michele Colajanni,et al.  Deep Reinforcement Adversarial Learning Against Botnet Evasion Attacks , 2020, IEEE Transactions on Network and Service Management.

[37]  Di Wu,et al.  Evading Machine Learning Botnet Detection Models via Deep Reinforcement Learning , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[38]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[39]  Andreas Hotho,et al.  Flow-based Network Traffic Generation using Generative Adversarial Networks , 2018, Comput. Secur..

[40]  Michele Colajanni,et al.  DReLAB - Deep REinforcement Learning Adversarial Botnet: A benchmark dataset for adversarial attacks against botnet Intrusion Detection Systems , 2020, Data in brief.

[41]  Shan Suthaharan,et al.  What we learn from learning - Understanding capabilities and limitations of machine learning in botnet attacks , 2018, ArXiv.

[42]  Joachim Fabini,et al.  Botnet Communication Patterns , 2017, IEEE Communications Surveys & Tutorials.

[43]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[44]  Sudipta Mahapatra,et al.  A comparative analysis of machine learning techniques for botnet detection , 2017, SIN.

[45]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[46]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[47]  Gonzalo Martínez-Muñoz,et al.  A comparative analysis of gradient boosting algorithms , 2020, Artificial Intelligence Review.

[48]  Shengli Liu,et al.  An enhancing framework for botnet detection using generative adversarial networks , 2018, 2018 International Conference on Artificial Intelligence and Big Data (ICAIBD).

[49]  Yang Xu,et al.  A Brute-Force Black-Box Method to Attack Machine Learning-Based Systems in Cybersecurity , 2020, IEEE Access.

[50]  György Kovács,et al.  An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets , 2019, Appl. Soft Comput..

[51]  Yao Zhao,et al.  Adversarial Attacks and Defences Competition , 2018, ArXiv.

[52]  Jiankun Hu,et al.  A holistic review of Network Anomaly Detection Systems: A comprehensive survey , 2019, J. Netw. Comput. Appl..

[53]  Nathan Goodman A Survey of Advances in Botnet Technologies , 2017, ArXiv.