Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR’16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further.

[1]  Jian Zhang,et al.  A Feature Analysis Based Identifying Scheme Using GBDT for DDoS with Multiple Attack Vectors , 2019, Applied Sciences.

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  Gabriel Maciá-Fernández,et al.  Multivariate Big Data Analysis for Intrusion Detection: 5 steps from the haystack to the needle , 2019, Comput. Secur..

[4]  Bo Lang,et al.  Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey , 2019, Applied Sciences.

[5]  Gabriel Maciá-Fernández,et al.  Tackling the Big Data 4 vs for anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Andreas Hotho,et al.  A Survey of Network-based Intrusion Detection Data Sets , 2019, Comput. Secur..

[8]  Mahesh Shirole,et al.  Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives , 2018, 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS).

[9]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[10]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[11]  Edoardo Saccenti,et al.  Semi-Supervised Multivariate Statistical Network Monitoring for Learning Security Threats , 2019, IEEE Transactions on Information Forensics and Security.

[12]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  Salah El Hadaj,et al.  Performance evaluation of intrusion detection based on machine learning using Apache Spark , 2018 .

[15]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[16]  Abdallah Shami,et al.  Data Mining Techniques in Intrusion Detection Systems: A Systematic Literature Review , 2018, IEEE Access.

[17]  Max Mühlhäuser,et al.  On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection , 2019, ACM Trans. Priv. Secur..

[18]  Roberto Therón,et al.  UGR'16: A new dataset for the evaluation of cyclostationarity-based network IDSs , 2018, Comput. Secur..

[19]  Jiankun Hu,et al.  A novel statistical technique for intrusion detection systems , 2018, Future Gener. Comput. Syst..

[20]  Jamal Hussain,et al.  Feature Analysis, Evaluation and Comparisons of Classification Algorithms Based on Noisy Intrusion Dataset☆ , 2016 .

[21]  Zahid Akhtar,et al.  KDD Cup 99 Data Sets: A Perspective on the Role of Data Sets in Network Intrusion Detection Research , 2019, Computer.

[22]  Parvez Faruki,et al.  Network Intrusion Detection for IoT Security Based on Learning Techniques , 2019, IEEE Communications Surveys & Tutorials.

[23]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[24]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[25]  A. N. Zincir-Heywood,et al.  Intrusion Detection Systems , 2008 .

[26]  Kotagiri Ramamohanarao,et al.  Layered Approach Using Conditional Random Fields for Intrusion Detection , 2010, IEEE Transactions on Dependable and Secure Computing.

[27]  Guangyu Xu,et al.  Machine Learning Techniques for Classifying Network Anomalies and Intrusions , 2019, 2019 IEEE International Symposium on Circuits and Systems (ISCAS).

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Jiankun Hu,et al.  Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling , 2017, J. Netw. Comput. Appl..

[30]  Shahram Babaie,et al.  A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection , 2018, Comput. Networks.

[31]  Awais Ahmad,et al.  Real time intrusion detection system for ultra-high-speed big data environments , 2016, The Journal of Supercomputing.

[32]  Howon Kim,et al.  Network Intrusion Detection Based on Novel Feature Selection Model and Various Recurrent Neural Networks , 2019, Applied Sciences.

[33]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[34]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[35]  Max Mühlhäuser,et al.  Analyzing flow-based anomaly intrusion detection using Replicator Neural Networks , 2016, 2016 14th Annual Conference on Privacy, Security and Trust (PST).

[36]  Gabriel Maciá-Fernández,et al.  Hierarchical PCA-based multivariate statistical network monitoring for anomaly detection , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[37]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[38]  Jill Slay,et al.  The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set , 2016, Inf. Secur. J. A Glob. Perspect..