An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset

Abstract Network Intrusion Detection System (NIDS) is a key security device in modern networks to detect malicious activities. However, the problem of imbalanced class associated with intrusion detection dataset limits the classifier’s performance for minority classes. To improve the detection rate of minority classes while ensuring efficiency, we propose a novel class imbalance processing technology for large-scale dataset, referred to as SGM, which combines Synthetic Minority Over-Sampling Technique (SMOTE) and under-sampling for clustering based on Gaussian Mixture Model (GMM). We then design a flow-based intrusion detection model, SGM-CNN, which integrates imbalanced class processing with convolutional neural network, and investigate the impact of different numbers of convolution kernels and different learning rates on model performance. The advantages of the proposed model are verified using the UNSW-NB15 and CICIDS2017 datasets. The experimental results show that i) for binary classification and multiclass classification on the UNSW-NB15 dataset, SGM-CNN achieves a detection rate of 99.74% and 96.54%, respectively; ii) for 15-class classification on the CICIDS2017 dataset, it achieves a detection rate of 99.85%. We compare five imbalanced processing methods and two classification algorithms, and conclude that SGM-CNN provides an effective solution to imbalanced intrusion detection and outperforms the state-of-the-art intrusion detection methods.

[1]  Abien Fred Agarap A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data , 2017, ICMLC.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Phurivit Sangkatsanee,et al.  Practical real-time intrusion detection using machine learning approaches , 2011, Comput. Commun..

[4]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Jill Slay,et al.  The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set , 2016, Inf. Secur. J. A Glob. Perspect..

[6]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Miad Faezipour,et al.  Features Dimensionality Reduction Approaches for Machine Learning Based Network Intrusion Detection , 2019, Electronics.

[8]  Zhong Ming,et al.  An improved NSGA-III algorithm for feature selection used in intrusion detection , 2017, Knowl. Based Syst..

[9]  Ali A. Ghorbani,et al.  An Evaluation Framework for Intrusion Detection Dataset , 2016, 2016 International Conference on Information Science and Security (ICISS).

[10]  Adel Binbusayyis,et al.  Identifying and Benchmarking Key Features for Cyber Intrusion Detection: An Ensemble Approach , 2019, IEEE Access.

[11]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[12]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Yang-Wai Chow,et al.  Interactive three-dimensional visualization of network intrusion detection data for machine learning , 2020, Future Gener. Comput. Syst..

[14]  Sajjan G. Shiva,et al.  Comparative Analysis of ML Classifiers for Network Intrusion Detection , 2019, ICICT.

[15]  Simon Fong,et al.  Multi-stage optimization of a deep model: A case study on ground motion modeling , 2018, PloS one.

[16]  M. A. Jabbar,et al.  Random Forest Modeling for Network Intrusion Detection System , 2016 .

[17]  D. Lalitha Bhaskari,et al.  Intrusion Detection Using Random Forests Classifier with SMOTE and Feature Reduction , 2013, 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies.

[18]  Mei Song,et al.  PCCN: Parallel Cross Convolutional Neural Network for Abnormal Network Traffic Flows Detection in Multi-Class Imbalanced Network Traffic Flows , 2019, IEEE Access.

[19]  Puja Padiya,et al.  Feature Selection Based Hybrid Anomaly Intrusion Detection System Using K Means and RBF Kernel Function , 2015 .

[20]  Bo Lang,et al.  Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey , 2019, Applied Sciences.

[21]  Chase Qishi Wu,et al.  An Effective Deep Learning Based Scheme for Network Intrusion Detection , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[22]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  El-Sayed M. El-Alfy,et al.  A multiclass cascade of artificial neural network for network intrusion detection , 2017, J. Intell. Fuzzy Syst..

[25]  Salah El Hadaj,et al.  A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection , 2017 .

[26]  Ozgur Koray Sahingoz,et al.  Increasing the Performance of Machine Learning-Based IDSs on an Imbalanced and Up-to-Date Dataset , 2020, IEEE Access.

[27]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[28]  Ali A. Ghorbani,et al.  Towards a Reliable Intrusion Detection Benchmark Dataset , 2017 .

[29]  M. Kameswara Rao,et al.  Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques , 2019, First International Conference on Sustainable Technologies for Computational Intelligence.

[30]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[31]  Elena Sitnikova,et al.  Towards Developing Network forensic mechanism for Botnet Activities in the IoT based on Machine Learning Techniques , 2017, MONAMI.

[32]  Mansour Sheikhan,et al.  Intrusion detection using reduced-size RNN based on feature grouping , 2010, Neural Computing and Applications.

[33]  Jinoh Kim,et al.  A survey of deep learning-based network anomaly detection , 2017, Cluster Computing.

[34]  Erdogan Dogdu,et al.  Intrusion Detection Using Big Data and Deep Learning Techniques , 2019, ACM Southeast Regional Conference.

[35]  Qusay H. Mahmoud,et al.  A Two-Level Hybrid Model for Anomalous Activity Detection in IoT Networks , 2019, 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[36]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[37]  Mohammad Zulkernine,et al.  Random-Forests-Based Network Intrusion Detection Systems , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Jiankun Hu,et al.  An Approach for Host-Based Intrusion Detection System Design Using Convolutional Neural Network , 2017, MONAMI.

[39]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[40]  Jiajun Lin,et al.  A Multiple-Layer Representation Learning Model for Network-Based Attack Detection , 2019, IEEE Access.