A Consolidated Decision Tree-Based Intrusion Detection System for Binary and Multiclass Imbalanced Datasets

The widespread acceptance and increase of the Internet and mobile technologies have revolutionized our existence. On the other hand, the world is witnessing and suffering due to technologically aided crime methods. These threats, including but not limited to hacking and intrusions and are the main concern for security experts. Nevertheless, the challenges facing effective intrusion detection methods continue closely associated with the researcher’s interests. This paper’s main contribution is to present a host-based intrusion detection system using a C4.5-based detector on top of the popular Consolidated Tree Construction (CTC) algorithm, which works efficiently in the presence of class-imbalanced data. An improved version of the random sampling mechanism called Supervised Relative Random Sampling (SRRS) has been proposed to generate a balanced sample from a high-class imbalanced dataset at the detector’s pre-processing stage. Moreover, an improved multi-class feature selection mechanism has been designed and developed as a filter component to generate the IDS datasets’ ideal outstanding features for efficient intrusion detection. The proposed IDS has been validated with state-of-the-art intrusion detection systems. The results show an accuracy of 99.96% and 99.95%, considering the NSL-KDD dataset and the CICIDS2017 dataset using 34 features.

[1]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[2]  Lam-for Kwok,et al.  Enhancing False Alarm Reduction Using Voted Ensemble Selection in Intrusion Detection , 2013, Int. J. Comput. Intell. Syst..

[3]  Weiwei Yuan,et al.  Multi-class imbalanced learning implemented in network intrusion detection , 2011, 2011 International Conference on Computer Science and Service System (CSSS).

[4]  Umberto Castellani,et al.  Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Rafael Martínez-Peláez,et al.  Classification of network anomalies in flow level network traffic using Bayesian networks , 2018, 2018 International Conference on Electronics, Communications and Computers (CONIELECOMP).

[6]  Mansour Sheikhan,et al.  Modification of supervised OPF-based intrusion detection systems using unsupervised learning and social network concept , 2017, Pattern Recognit..

[7]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[8]  Timo D. Hämäläinen,et al.  Artificial Immune System Based Intrusion Detection: Innate Immunity using an Unsupervised Learning Approach , 2014 .

[9]  Richard E. Overill,et al.  Detection of known and unknown DDoS attacks using Artificial Neural Networks , 2016, Neurocomputing.

[10]  Tim Watson,et al.  A LogitBoost-Based Algorithm for Detecting Known and Unknown Web Attacks , 2017, IEEE Access.

[11]  Adel Ammar A Decision Tree Classifier for Intrusion Detection Priority Tagging , 2015 .

[12]  Chen-Ching Liu,et al.  Intelligent Electronic Devices with Collaborative Intrusion Detection Systems , 2019, 2018 IEEE Power & Energy Society General Meeting (PESGM).

[13]  R. Vijayanand,et al.  Intrusion detection system for wireless mesh network using multiple support vector machine classifiers with genetic-algorithm-based feature selection , 2018, Comput. Secur..

[14]  Mohamed M. Abd-Eldayem A proposed HTTP service based IDS , 2014 .

[15]  Andrii Shalaginov,et al.  Cybercrime Investigations in the Era of Smart Applications: Way Forward Through Big Data , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[16]  Yang Yu,et al.  An Effective Two-Step Intrusion Detection Approach Based on Binary Classification and $k$ -NN , 2018, IEEE Access.

[17]  Woongsup Kim,et al.  Toward Bulk Synchronous Parallel-Based Machine Learning Techniques for Anomaly Detection in High-Speed Big Data Networks , 2017, Symmetry.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  Hamed Taherdoost,et al.  Sampling Methods in Research Methodology; How to Choose a Sampling Technique for Research , 2016 .

[20]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[22]  Seyed Mojtaba Hosseini Bamakan,et al.  Ramp loss K-Support Vector Classification-Regression; a robust and sparse multi-class approach to the intrusion detection problem , 2017, Knowl. Based Syst..

[23]  Gulshan Kumar,et al.  Design of an Evolutionary Approach for Intrusion Detection , 2013, TheScientificWorldJournal.

[24]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[25]  Seyed Mojtaba Hosseini Bamakan,et al.  An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization , 2016, Neurocomputing.

[26]  Dechang Pi,et al.  HML-IDS: A Hybrid-Multilevel Anomaly Prediction Approach for Intrusion Detection in SCADA Systems , 2019, IEEE Access.

[27]  Stefan C. Kremer,et al.  Network intrusion detection system based on recursive feature addition and bigram technique , 2018, Comput. Secur..

[28]  Vishwas Sharma,et al.  Usefulness of DARPA dataset for intrusion detection system evaluation , 2008, SPIE Defense + Commercial Sensing.

[29]  Mehmet Hacibeyoglu,et al.  Design of Multilevel Hybrid Classifier with Variant Feature Sets for Intrusion Detection System , 2016, IEICE Trans. Inf. Syst..

[30]  T. Wieczorek,et al.  Comparison of feature ranking methods based on information entropy , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  Iqbal Gondal,et al.  Survey of intrusion detection systems: techniques, datasets and challenges , 2019, Cybersecurity.

[32]  Ali A. Ghorbani,et al.  An Evaluation Framework for Intrusion Detection Dataset , 2016, 2016 International Conference on Information Science and Security (ICISS).

[33]  Yue Han,et al.  Stable Gene Selection from Microarray Data via Sample Weighting , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[35]  Olatz Arbelaitz,et al.  Combining multiple class distribution modified subsamples in a single tree , 2007, Pattern Recognit. Lett..

[36]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Olatz Arbelaitz,et al.  Coverage-based resampling: Building robust consolidated decision trees , 2015, Knowl. Based Syst..

[38]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[39]  Qi Shi,et al.  A Deep Learning Approach to Network Intrusion Detection , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[40]  Li Zhang,et al.  A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks , 2016, Neural Computing and Applications.

[41]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[42]  Yu Wang,et al.  Designing collaborative blockchained signature-based intrusion detection in IoT environments , 2019, Future Gener. Comput. Syst..