In-depth Comparative Evaluation of Supervised Machine Learning Approaches for Detection of Cybersecurity Threats

This paper describes the process and results of analyzing CICIDS2017, a modern, labeled data set for testing intrusion detection systems. The data set is divided into several days, each pertaining to different attack classes (Dos, DDoS, infiltration, botnet, etc.). A pipeline has been created that includes nine supervised learning algorithms. The goal was binary classification of benign versus attack traffic. Cross-validated parameter optimization, using a voting mechanism that includes five classification metrics, was employed to select optimal parameters. These results were interpreted to discover whether certain parameter choices were dominant for most (or all) of the attack classes. Ultimately, every algorithm was retested with optimal parameters to obtain the final classification scores. During the review of these results, execution time, both on consumer- and corporate-grade equipment, was taken into account as an additional requirement. The work detailed in this paper establishes a novel supervised machine learning performance baseline for CICIDS2017. Graphics of the results as well as the raw tables are publicly available at https://gitlab.ilabt.imec.be/lpdhooge/cicids2017-ml-graphics.

[1]  Stefan Axelsson,et al.  Intrusion Detection Systems: A Survey and Taxonomy , 2002 .

[2]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[3]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[4]  Wolfgang Banzhaf,et al.  The use of computational intelligence in intrusion detection systems: A review , 2010, Appl. Soft Comput..

[5]  Robert C. Atkinson,et al.  Shallow and Deep Networks Intrusion Detection System: A Taxonomy and Survey , 2017, ArXiv.

[6]  John McHugh,et al.  The 1998 Lincoln Laboratory IDS Evaluation , 2000, Recent Advances in Intrusion Detection.

[7]  Application of distributed computing and machine learning technologies to cybersecurity , 2019 .

[8]  Bingyang Li,et al.  Distributed Abnormal Behavior Detection Approach Based on Deep Belief Network and Ensemble SVM Using Spark , 2018, IEEE Access.

[9]  Anil Somayaji,et al.  Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[12]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..