Bridging the gap to real-world for network intrusion detection systems with data-centric approach

Most research using machine learning (ML) for network intrusion detection systems (NIDS) uses well-established datasets such as KDD-CUP99, NSL-KDD, UNSW-NB15, and CICIDS-2017. In this context, the possibilities of machine learning techniques are explored, aiming for metrics improvements compared to the published baselines (model-centric approach). However, those datasets present some limitations as aging that make it unfeasible to transpose those ML-based solutions to real-world applications. This paper presents a systematic data-centric approach to address the current limitations of NIDS research, specifically the datasets. This approach generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.

[1]  Chunhua Wang,et al.  Machine Learning and Deep Learning Methods for Cybersecurity , 2018, IEEE Access.

[2]  Robert C. Atkinson,et al.  A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems , 2020, IEEE Access.

[3]  J. Friedrich,et al.  Security Engineering: a Guide to Building Dependable Distributed Systems Banking and Bookkeeping , 2022 .

[4]  Altair Olivo Santin,et al.  Machine Learning Intrusion Detection in Big Data Era: A Multi-Objective Approach for Longer Model Lifespans , 2021, IEEE Transactions on Network Science and Engineering.

[5]  George Kesidis,et al.  Salting Public Traces with Attack Traffic to Test Flow Classifiers , 2011, CSET.

[6]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[7]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[8]  Mohanad Sarhan,et al.  Towards a Standard Feature Set of NIDS Datasets , 2021, ArXiv.

[9]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[10]  Peter Baumgartner,et al.  R – Data Science , 2017 .

[11]  Zahir Tari,et al.  TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems , 2020, IEEE Access.

[12]  Andreas Hotho,et al.  A Survey of Network-based Intrusion Detection Data Sets , 2019, Comput. Secur..

[13]  Kensuke Fukuda,et al.  MAWILab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking , 2010, CoNEXT.

[14]  Mohammed Anbar,et al.  Internet of Things Market Analysis Forecasts, 2020–2030 , 2020, 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4).

[15]  Osamu Saotome,et al.  An End-to-End Framework for Machine Learning-Based Network Intrusion Detection System , 2021, IEEE Access.

[16]  Bo Lang,et al.  Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey , 2019, Applied Sciences.

[17]  Adnan Shahid Khan,et al.  Network intrusion detection system: A systematic study of machine learning and deep learning approaches , 2020, Trans. Emerg. Telecommun. Technol..

[18]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[19]  M. Malowidzki,et al.  Network Intrusion Detection : Half a Kingdom for a Good Dataset , 2015 .