Classification Hardness for Supervised Learners on 20 Years of Intrusion Detection Data

This article consolidates analysis of established (NSL-KDD) and new intrusion detection datasets (ISCXIDS2012, CICIDS2017, CICIDS2018) through the use of supervised machine learning (ML) algorithms. The uniformity in analysis procedure opens up the option to compare the obtained results. It also provides a stronger foundation for the conclusions about the efficacy of supervised learners on the main classification task in network security. This research is motivated in part to address the lack of adoption of these modern datasets. Starting with a broad scope that includes classification by algorithms from different families on both established and new datasets has been done to expand the existing foundation and reveal the most opportune avenues for further inquiry. After obtaining baseline results, the classification task was increased in difficulty, by reducing the available data to learn from, both horizontally and vertically. The data reduction has been included as a stress-test to verify if the very high baseline results hold up under increasingly harsh constraints. Ultimately, this work contains the most comprehensive set of results on the topic of intrusion detection through supervised machine learning. Researchers working on algorithmic improvements can compare their results to this collection, knowing that all results reported here were gathered through a uniform framework. This work’s main contributions are the outstanding classification results on the current state of the art datasets for intrusion detection and the conclusion that these methods show remarkable resilience in classification performance even when aggressively reducing the amount of data to learn from.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Bruno Volckaert,et al.  In-depth Comparative Evaluation of Supervised Machine Learning Approaches for Detection of Cybersecurity Threats , 2019, IoTBDS.

[3]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[4]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[5]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[6]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[7]  Wolfgang Banzhaf,et al.  The use of computational intelligence in intrusion detection systems: A review , 2010, Appl. Soft Comput..

[8]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[9]  Hongke Zhang,et al.  Accuracy or delay? A game in detecting interest flooding attacks , 2018, Internet Technol. Lett..

[10]  Chun-Hung Richard Lin,et al.  Intrusion detection system: A comprehensive review , 2013, J. Netw. Comput. Appl..

[11]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[12]  Naimah Yaakob,et al.  Effective and efficient network anomaly detection system using machine learning algorithm , 2019, Bulletin of Electrical Engineering and Informatics.

[13]  Manju Khari,et al.  Analysis on Intrusion Detection by Machine Learning Techniques: A Review , 2013 .

[14]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[15]  C. A. Kumar,et al.  An analysis of supervised tree based classifiers for intrusion detection system , 2013, 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering.

[16]  Vijay Varadharajan,et al.  A Detailed Investigation and Analysis of Using Machine Learning Techniques for Intrusion Detection , 2019, IEEE Communications Surveys & Tutorials.

[17]  Gang Liu,et al.  Efficient DDoS attacks mitigation for stateful forwarding in Internet of Things , 2019, J. Netw. Comput. Appl..

[18]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Jason Lee,et al.  The NIDS Cluster: Scalable, Stateful Network Intrusion Detection on Commodity Hardware , 2007, RAID.