Flow-Data Gathering Using NetFlow Sensors for Fitting Malicious-Traffic Detection Models

Advanced persistent threats (APTs) are a growing concern in cybersecurity. Many companies and governments have reported incidents related to these threats. Throughout the life cycle of an APT, one of the most commonly used techniques for gaining access is network attacks. Tools based on machine learning are effective in detecting these attacks. However, researchers usually have problems with finding suitable datasets for fitting their models. The problem is even harder when flow data are required. In this paper, we describe a framework to gather flow datasets using a NetFlow sensor. We also present the Docker-based framework for gathering netflow data (DOROTHEA), a Docker-based solution implementing the above framework. This tool aims to easily generate taggable network traffic to build suitable datasets for fitting classification models. In order to demonstrate that datasets gathered with DOROTHEA can be used for fitting classification models for malicious-traffic detection, several models were built using the model evaluator (MoEv), a general-purpose tool for training machine-learning algorithms. After carrying out the experiments, four models obtained detection rates higher than 93%, thus demonstrating the validity of the datasets gathered with the tool.

[1]  Marco Canini,et al.  A High Performance IP Traffic Generation Tool Based On The Intel IXP2400 Network Processor , 2006 .

[2]  Colin Tankard,et al.  Advanced Persistent threats and how to monitor and deter them , 2011, Netw. Secur..

[3]  Joey Dreijer NetFlow Anomaly Detection ; nding covert channels on the network , 2014 .

[4]  Ahmed Eldawy,et al.  MNTG: An Extensible Web-Based Traffic Generator , 2013, SSTD.

[5]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[6]  Vicente Matellán Olivera,et al.  Detection of Cyber-attacks to indoor real time localization systems for autonomous robots , 2018, Robotics Auton. Syst..

[7]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[8]  Bayu Adhi Tama,et al.  TSE-IDS: A Two-Stage Classifier Ensemble for Intelligent Anomaly-Based Intrusion Detection System , 2019, IEEE Access.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Anil Kumar Gupta,et al.  Comparative study of various traffic generator tools , 2014, 2014 Recent Advances in Engineering and Computational Sciences (RAECS).

[11]  Miguel Ángel Conde González,et al.  Academic Success Assessment through Version Control Systems , 2020, Applied Sciences.

[12]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[13]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Xiaojun Guo,et al.  A graphical feature generation approach for intrusion detection , 2016 .

[15]  Mustafa Sanlı,et al.  FPGEN: A fast, scalable and programmable traffic generator for the performance evaluation of high-speed computer networks , 2011, Perform. Evaluation.

[16]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[17]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[18]  Benoit Claise,et al.  Specification of the IP Flow Information Export (IPFIX) Protocol for the Exchange of Flow Information , 2013, RFC.

[19]  A. Ganapathiraju,et al.  LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL , 1995 .

[20]  Paul Barford,et al.  Harpoon: a flow-level traffic generator for router and network tests , 2004, SIGMETRICS '04/Performance '04.

[21]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[22]  Gordon Fyodor Lyon,et al.  Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning , 2009 .

[23]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[24]  Naimah Yaakob,et al.  Effective and efficient network anomaly detection system using machine learning algorithm , 2019, Bulletin of Electrical Engineering and Informatics.

[25]  J. Friedman Regularized Discriminant Analysis , 1989 .

[26]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[27]  R. Badlishah Ahmad,et al.  Performances of Machine Learning Algorithms for Binary Classification of Network Anomaly Detection System , 2018 .

[28]  Francisco J. Rodríguez Lera,et al.  Systematic Mapping of Detection Techniques for Advanced Persistent Threats , 2019, CISIS.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Benoit Claise,et al.  Cisco Systems NetFlow Services Export Version 9 , 2004, RFC.

[31]  Javier Aracil,et al.  Utilidad de los flujos NetFlow de RedIRIS para análisis de una red académica , 2008 .

[32]  Paul Voigt,et al.  The EU General Data Protection Regulation (GDPR) , 2017 .