CBFS: A Clustering-Based Feature Selection Mechanism for Network Anomaly Detection

Network traffic flows contain a large number of correlated and redundant features that significantly degrade the performance of data-driven network anomaly detection. In this paper, we propose a novel clustering and ranking-based feature selection scheme, termed as CBFS, to reduce redundant features in network traffic, which can greatly improve the efficiency and accuracy of feature-based network anomaly detection. Our proposed CBFS scheme first calculates the distance between feature vectors, merges these feature vectors into different clusters, and selects the center of each cluster as a representative feature vector. The proposed CBFS scheme then integrates the information gain and gain rate of features to further streamline the number of features on the basis of clustering generation. Finally, the proposed CBFS scheme applies the decision-tree-based classifier to the generated subset of features so that the abnormal traffic flows are detected. The experimental results show that our proposed CBFS scheme is effective in reducing feature dimensions across different datasets. The proposed CBFS scheme can achieve feature reduction rates of 20% to 70%, and cost-performance of up to 70% as compared to benchmarking methods.

[1]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[2]  Mohamed Limam,et al.  Ensemble feature selection for high dimensional data: a new method and a comparative study , 2017, Advances in Data Analysis and Classification.

[3]  Mohamed Touahria,et al.  Feature Selection Algorithms in Intrusion Detection System: A Survey , 2018, KSII Trans. Internet Inf. Syst..

[4]  Tai-hoon Kim,et al.  Linear Correlation-Based Feature Selection for Network Intrusion Detection Model , 2013, SecNet.

[5]  Ali Dehghantanha,et al.  Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing , 2016, EURASIP Journal on Wireless Communications and Networking.

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Yasmen Wahba,et al.  Improving the Performance of Multi-class Intrusion Detection Systems using Feature Reduction , 2015, ArXiv.

[8]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[9]  Nandita Sengupta,et al.  Generation of Sufficient Cut Points to Discretize Network Traffic Data Sets , 2012, SEMCCO.

[10]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[11]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[12]  Hui-Huang Hsu,et al.  Feature Selection via Correlation Coefficient Clustering , 2010, J. Softw..

[13]  Tanja Zseby,et al.  Analysis of network traffic features for anomaly detection , 2014, Machine Learning.

[14]  Mansour Sheikhan,et al.  Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems , 2015, Soft Computing.

[15]  Tharam S. Dillon,et al.  CorrCorr: A feature selection method for multivariate correlation network anomaly detection techniques , 2019, Comput. Secur..

[16]  Shahram Babaie,et al.  A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection , 2018, Comput. Networks.

[17]  Xindong Wu,et al.  Feature selection using hierarchical feature clustering , 2011, CIKM '11.

[18]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[19]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[20]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[21]  Kashif Javed,et al.  Feature Selection Based on Class-Dependent Densities for High-Dimensional Binary Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[22]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[23]  K. Muneeswaran,et al.  Firefly algorithm based feature selection for network intrusion detection , 2019, Comput. Secur..

[24]  Harish Kumar,et al.  An intrusion detection system using network traffic profiling and online sequential extreme learning machine , 2015, Expert Syst. Appl..

[25]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[26]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.