Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset

An intrusion detection system (IDS) is an important protection instrument for detecting complex network attacks. Various machine learning (ML) or deep learning (DL) algorithms have been proposed for implementing anomaly-based IDS (AIDS). Our review of the AIDS literature identifies some issues in related work, including the randomness of the selected algorithms, parameters, and testing criteria, the application of old datasets, or shallow analyses and validation of the results. This paper comprehensively reviews previous studies on AIDS by using a set of criteria with different datasets and types of attacks to set benchmarking outcomes that can reveal the suitable AIDS algorithms, parameters, and testing criteria. Specifically, this paper applies 10 popular supervised and unsupervised ML algorithms for identifying effective and efficient ML–AIDS of networks and computers. These supervised ML algorithms include the artificial neural network (ANN), decision tree (DT), k-nearest neighbor (k-NN), naive Bayes (NB), random forest (RF), support vector machine (SVM), and convolutional neural network (CNN) algorithms, whereas the unsupervised ML algorithms include the expectation-maximization (EM), k-means, and self-organizing maps (SOM) algorithms. Several models of these algorithms are introduced, and the turning and training parameters of each algorithm are examined to achieve an optimal classifier evaluation. Unlike previous studies, this study evaluates the performance of AIDS by measuring the true positive and negative rates, accuracy, precision, recall, and F-Score of 31 ML-AIDS models. The training and testing time for ML-AIDS models are also considered in measuring their performance efficiency given that time complexity is an important factor in AIDSs. The ML-AIDS models are tested by using a recent and highly unbalanced multiclass CICIDS2017 dataset that involves real-world network attacks. In general, the k-NN-AIDS, DT-AIDS, and NB-AIDS models obtain the best results and show a greater capability in detecting web attacks compared with other models that demonstrate irregular and inferior results.

[1]  Anamika Yadav,et al.  Performance analysis of NSL-KDD dataset using ANN , 2015, 2015 International Conference on Signal Processing and Communication Engineering Systems.

[2]  Miguel-Ángel Sicilia,et al.  Unsupervised intrusion detection through skip-gram models of network behavior , 2018, Comput. Secur..

[3]  Kwangjo Kim,et al.  Deep learning in intrusion detection perspective: Overview and further challenges , 2017, 2017 International Workshop on Big Data and Information Security (IWBIS).

[4]  Cherukuri Aswani Kumar,et al.  Improving Accuracy of Intrusion Detection Model Using PCA and optimized SVM , 2016, J. Comput. Inf. Technol..

[5]  Soodeh Hosseini,et al.  Anomaly process detection using negative selection algorithm and classification techniques , 2019, Evolving Systems.

[6]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Aditi Roy,et al.  Multi-classification of UNSW-NB15 Dataset for Network Anomaly Detection System , 2020 .

[9]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[10]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[11]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[12]  Neminath Hubballi,et al.  False alarm minimization techniques in signature-based intrusion detection systems: A survey , 2014, Comput. Commun..

[13]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[14]  Bayu Adhi Tama,et al.  An Enhanced Anomaly Detection in Web Traffic Using a Stack of Classifier Ensemble , 2020, IEEE Access.

[15]  Quang-Vinh Dang,et al.  Studying Machine Learning Techniques for Intrusion Detection Systems , 2019, FDSE.

[16]  Mansoor Alam,et al.  A Deep Learning Approach for Network Intrusion Detection System , 2016, EAI Endorsed Trans. Security Safety.

[17]  Sandeep Gurung,et al.  Deep Learning Approach on Network Intrusion Detection System using NSL-KDD Dataset , 2019, International Journal of Computer Network and Information Security.

[18]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[19]  Sudhir Kumar Sharma,et al.  Analysis of KDD Dataset Attributes - Class wise for Intrusion Detection , 2015 .

[20]  A. Nur Zincir-Heywood,et al.  Analysis of Three Intrusion Detection System Benchmark Datasets Using Machine Learning Algorithms , 2005, ISI.

[21]  Helmi Md Rais,et al.  Ant Colony Optimization and Feature Selection for Intrusion Detection , 2016 .

[22]  Jill Slay,et al.  Novel Geometric Area Analysis Technique for Anomaly Detection Using Trapezoidal Area Estimation on Large-Scale Networks , 2019, IEEE Transactions on Big Data.

[23]  Victor C. M. Leung,et al.  Intrusion Detection System Based on Decision Tree over Big Data in Fog Environment , 2018, Wirel. Commun. Mob. Comput..

[24]  Ji Won Kim,et al.  CNN-Based Network Intrusion Detection against Denial-of-Service Attacks , 2020, Electronics.

[25]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[26]  Zhuo Lu,et al.  Effectiveness of Machine Learning Based Intrusion Detection Systems , 2019, SpaCCS.

[27]  Amin Allahyar,et al.  Fast Feature Reduction in intrusion detection datasets , 2012, 2012 Proceedings of the 35th International Convention MIPRO.

[28]  Nivethitha Somu,et al.  An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm , 2019, Artificial Intelligence Review.

[29]  Mika Ylianttila,et al.  Evaluation of Machine Learning Techniques for Security in SDN , 2020, 2020 IEEE Globecom Workshops (GC Wkshps.

[30]  Georgios Kambourakis,et al.  Intrusion Detection in 802.11 Networks: Empirical Evaluation of Threats and a Public Dataset , 2016, IEEE Communications Surveys & Tutorials.

[31]  Vijay Kumar Jha,et al.  Data Mining in Intrusion Detection: A Comparative Study of Methods, Types and Data Sets , 2013 .

[32]  M. A. Jabbar,et al.  Random Forest Modeling for Network Intrusion Detection System , 2016 .

[33]  P. Balasubramanie,et al.  Hadoop Based Parallel Binary Bat Algorithm for Network Intrusion Detection , 2017, International Journal of Parallel Programming.

[34]  Jamal Hussain,et al.  A two-stage hybrid classification technique for network intrusion detection system , 2016, Int. J. Comput. Intell. Syst..

[35]  Iftikhar Ahmad,et al.  Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000 , 2018 .

[36]  M. Abdulraheem,et al.  A DETAILED ANALYSIS OF NEW INTRUSION DETECTION DATASET , 2019 .

[37]  Aida Mustapha,et al.  Comprehensive Review of Artificial Intelligence and Statistical Approaches in Distributed Denial of Service Attack and Defense Methods , 2019, IEEE Access.

[38]  Saeed Sharifian,et al.  Modified parallel random forest for intrusion detection systems , 2016, The Journal of Supercomputing.

[39]  Bayu Adhi Tama,et al.  An in-depth experimental study of anomaly detection using gradient boosted machine , 2017, Neural Computing and Applications.

[40]  Shahrzad Zargari,et al.  Feature selection in UNSW-NB15 and KDDCUP'99 datasets , 2017, 2017 IEEE 26th International Symposium on Industrial Electronics (ISIE).

[41]  Jyothsna Veeramreddy,et al.  FCAAIS: Anomaly based network intrusion detection through feature correlation analysis and association impact scale , 2016, ICT Express.

[42]  Rita Chhikara,et al.  Significance of Hybrid Feature Selection Technique for Intrusion Detection Systems , 2017 .

[43]  Wenhao He,et al.  Ensemble Feature Selection for Improving Intrusion Detection Classification Accuracy , 2019, Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science.

[44]  Mostafa Ghazizadeh Ahsaee,et al.  Multivariate correlation coefficient and mutual information-based feature selection in intrusion detection , 2017, Inf. Secur. J. A Glob. Perspect..

[45]  Iraj Mahdavi,et al.  Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms , 2019, J. King Saud Univ. Comput. Inf. Sci..

[46]  Mohammad Khubeb Siddiqui,et al.  Analysis of KDD CUP 99 Dataset using Clustering based Data Mining , 2013 .

[47]  Shahram Babaie,et al.  A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection , 2018, Comput. Networks.

[48]  Chaouki Khammassi,et al.  A GA-LR wrapper approach for feature selection in network intrusion detection , 2017, Comput. Secur..

[49]  Deris Stiawan,et al.  CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection , 2020, IEEE Access.

[50]  Aida Mustapha,et al.  Effective Dimensionality Reduction of Payload-Based Anomaly Detection in TMAD Model for HTTP Payload , 2016, KSII Trans. Internet Inf. Syst..

[51]  Salah El Hadaj,et al.  A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection , 2017 .

[52]  Namita Parati,et al.  Intrusion Detection System Using Support Vector Machine , 2013 .

[53]  Shie-Jue Lee,et al.  Network intrusion detection using equality constrained-optimization-based extreme learning machines , 2018, Knowl. Based Syst..

[54]  Jiadong Ren,et al.  Building an Effective Intrusion Detection System by Using Hybrid Data Optimization Based on Machine Learning Algorithms , 2019, Secur. Commun. Networks.

[55]  Taisir Eldos,et al.  ON THE KDD'99 DATASET: STATISTICAL ANALYSIS FOR FEATURE SELECTION , 2012 .

[56]  Ditipriya Sinha,et al.  An integrated rule based intrusion detection system: analysis on UNSW-NB15 data set and the real time online dataset , 2019, Cluster Computing.

[57]  Virender Ranga,et al.  On evaluation of Network Intrusion Detection Systems: Statistical analysis of CIDDS-001 dataset using Machine Learning Techniques , 2019 .

[58]  Angel Kuri-Morales,et al.  The Best Neural Network Architecture , 2014, MICAI 2014.

[59]  Arafat Awajan,et al.  Experimental Evaluation of a Multi-layer Feed-Forward Artificial Neural Network Classifier for Network Intrusion Detection System , 2017, 2017 International Conference on New Trends in Computing Sciences (ICTCS).

[60]  Verónica Bolón-Canedo,et al.  Performance evaluation of unsupervised techniques in cyber-attack anomaly detection , 2019, Journal of Ambient Intelligence and Humanized Computing.

[61]  Neeraj Bhargava,et al.  Decision Tree Analysis on J48 Algorithm for Data Mining , 2013 .

[62]  Pierre-François Marteau,et al.  Intrusion Detection in Network Systems Through Hybrid Supervised and Unsupervised Machine Learning Process: A Case Study on the ISCX Dataset , 2018, 2018 1st International Conference on Data Intelligence and Security (ICDIS).

[63]  M. Mirnia,et al.  Integration Bat Algorithm with k-means for Intrusion Detection System , 2017 .

[64]  JooHwa Lee,et al.  GAN-based imbalanced data intrusion detection system , 2019, Personal and Ubiquitous Computing.

[65]  Georgios Kambourakis,et al.  Dendron : Genetic trees driven rule induction for network intrusion detection systems , 2018, Future Gener. Comput. Syst..

[66]  Md. Saiful Islam,et al.  Anomaly based Intrusion Detection System using Genetic Algorithm and K-Centroid Clustering , 2017 .

[67]  Ali A. Ghorbani,et al.  A Detailed Analysis of the CICIDS2017 Data Set , 2018, ICISSP.

[68]  Karim Afdel,et al.  Semi-supervised machine learning approach for DDoS detection , 2018, Applied Intelligence.

[69]  Jill Slay,et al.  Big Data Analytics for Intrusion Detection System: Statistical Decision-Making Using Finite Dirichlet Mixture Models , 2017 .

[70]  Naveen Bindra,et al.  Detecting DDoS Attacks Using Machine Learning Techniques and Contemporary Intrusion Detection Dataset , 2019, Automatic Control and Computer Sciences.

[71]  Mohamed Amine Ferrag,et al.  DeepCoin: A Novel Deep Learning and Blockchain-Based Energy Exchange Framework for Smart Grids , 2020, IEEE Transactions on Engineering Management.

[72]  Atilla Özgür,et al.  A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015 , 2016, PeerJ Prepr..

[73]  Ali Abdul Hussian Hassan,et al.  Evaluate the performance of K-Means and the fuzzy C-Means algorithms to formation balanced clusters in wireless sensor networks , 2020 .

[74]  Haipeng Yao,et al.  MSML: A Novel Multilevel Semi-Supervised Machine Learning Framework for Intrusion Detection System , 2019, IEEE Internet of Things Journal.

[75]  Saliha Buyukcorak,et al.  Hybrid Intrusion Detection System for DDoS Attacks , 2016, J. Electr. Comput. Eng..

[76]  Fengyin Li,et al.  Evaluation of Machine Learning Algorithms for Anomaly Detection , 2020, 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security).

[77]  Shailendra Sahu,et al.  Network intrusion detection system using J48 Decision Tree , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[78]  Sushanta Karmakar,et al.  Enhancing performance of anomaly based intrusion detection systems through dimensionality reduction using principal component analysis , 2016, 2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS).

[79]  Akhilesh Tiwari,et al.  A Rough Set Based Feature Selection on KDD CUP 99 Data Set , 2015 .

[80]  Ángel Fernando Kuri Morales The Best Neural Network Architecture , 2014, MICAI.

[81]  Md. Al Mehedi Hasan,et al.  Feature Selection for Intrusion Detection Using Random Forest , 2016 .

[82]  Kijun Han,et al.  Cyber Threat Detection Based on Artificial Neural Networks Using Event Profiles , 2019, IEEE Access.

[83]  S. Hamouda,et al.  Tuning to Optimize SVM Approach for Breast Cancer Diagnosis with Blood Analysis Data , 2020 .

[84]  Yasir Hamid,et al.  Feature selection techniques for intrusion detection using non-bio-inspired and bio-inspired optimization algorithms , 2017, Journal of Communications and Information Networks.

[85]  Ali A. Ghorbani,et al.  Towards a Reliable Intrusion Detection Benchmark Dataset , 2017 .

[86]  Abdurrahman A. Nasr,et al.  A Learnable Anomaly Detection System using Attributional Rules , 2016 .

[87]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[88]  Wali Khan Mashwani,et al.  A survey on intrusion detection and prevention in wireless ad-hoc networks , 2020, J. Syst. Archit..