Network Security Modeling using NetFlow Data: Detecting Botnet attacks in IP Traffic

Cybersecurity, security monitoring of malicious events in IP traffic, is an important field largely unexplored by statisticians. Computer scientists have made significant contributions in this area using statistical anomaly detection and other supervised learning methods to detect specific malicious events. In this research, we investigate the detection of botnet command and control (C&C) hosts in massive IP traffic. We use the NetFlow data, the industry standard for monitoring of IP traffic for exploratory analysis and extracting new features. Using statistical as well as deep learning models, we develop a statistical intrusion detection system (SIDS) to predict traffic traces identified with malicious attacks. Employing interpretative machine learning techniques, botnet traffic signatures are derived. These models successfully detected botnet C&C hosts and compromised devices. The results were validated by matching predictions to existing blacklists of published malicious IP addresses.

[1]  Niall M. Adams,et al.  Predictability of NetFlow data , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[2]  Sherif Sakr,et al.  On the interpretability of machine learning-based model for predicting hypertension , 2019, BMC Medical Informatics and Decision Making.

[3]  Brian Rexroad,et al.  Wide-Scale Botnet Detection and Characterization , 2007, HotBots.

[4]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[5]  Heejo Lee,et al.  BotGAD: detecting botnets by capturing group activities in network traffic , 2009, COMSWARE '09.

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[8]  Ronaldo M. Salles,et al.  Botnets: A survey , 2013, Comput. Networks.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Christopher Krügel,et al.  BotFinder: finding bots in network traffic without deep packet inspection , 2012, CoNEXT '12.

[11]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[13]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Fei Wang,et al.  Should Health Care Demand Interpretable Artificial Intelligence or Accept “Black Box” Medicine? , 2019, Annals of Internal Medicine.

[18]  Stefan Wager Asymptotic Theory for Random Forests , 2014, 1405.0352.