Data redundancy may lead to unreliable intrusion detection systems

An Intrusion Detection System (IDS) aims at protecting a network against attacks intended to exposing and/or vandalizing it. To build and test an IDS, network data are usually acquired containing attacks and normal behavior. The objective of this work is to use machine learning techniques to build IDSs and to investigate their reliability. To build and test the IDSs, KDDCUP99 has been used. The data contain a training set and a testing set with 4,898,430 samples (∼700MB) and 311,032 samples (∼45MB), respectively. However, the cleaned dataset via using SQL commands show that KDDCUP99 is highly redundant. The cleaned/distinct data are nearly one fifth of the original. Subsequently, experimental results have been performed using neural networks based IDSs. Some IDSs give low and median performances when tested using the redundant data and the distinct data, respectively, but other IDSs gave high and median performances using the redundant and the distinct data, respectively. Thus, there is a fluctuation in the performance when the data are redundant, which shows that an IDS built using a redundant dataset has unstable performance. The goal of preparing a balanced dataset is to only use it in testing the realistic performance of the IDS and has no relation to IDS generalization and implementation in real-world scenarios.

[1]  Bin Li,et al.  A Distributed Hebb Neural Network for Network Anomaly Detection , 2007, ISPA.

[2]  Sanjay Kumar Jena,et al.  A Multiclass SVM Classification Approach for Intrusion Detection , 2016, ICDCIT.

[3]  Shu-Ching Chen,et al.  A Distributed Agent-Based Approach to Intrusion Detection Using the Lightweight PCC Anomaly Detection Classifier , 2006, SUTC.

[4]  Frédéric Cuppens,et al.  Detecting Known and Novel Network Intrusions , 2006, SEC.

[5]  S. Zanero 360 Anomaly Based Unsupervised Intrusion Detection , 2007 .

[6]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[7]  Ramesh C. Agarwal,et al.  PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) , 2001, SDM.

[8]  Gürsel Serpen,et al.  Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set , 2004, Intell. Data Anal..

[9]  Ming Zhang,et al.  An Anomaly Detection Model for Network Intrusions Using One-Class SVM and Scaling Strategy , 2015, CollaborateCom.

[10]  Guoyin Wang,et al.  Knowledge Reduction Based on Divide and Conquer Method in Rough Set Theory , 2012 .

[11]  Yi Lu,et al.  Network Anomalous Attack Detection Based on Clustering and Classifier , 2006, CIS.

[12]  Zhiqiu Huang,et al.  Intrusion Detection with Tree-Based Data Mining Classification Techniques by Using KDD , 2017, MLICOM.

[13]  Malcolm I. Heywood,et al.  On dataset biases in a learning system with minimum a priori information for intrusion detection , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[14]  Xiaofeng Liao,et al.  Learning vector quantization neural network method for network intrusion detection , 2006, Wuhan University Journal of Natural Sciences.

[15]  Houkuan Huang,et al.  A Modified RBF Neural Network for Network Anomaly Detection , 2006, ISNN.

[16]  Stephen D. Bay,et al.  The UCI KDD archive of large data sets for data mining research and experimentation , 2000, SKDD.

[17]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[18]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[19]  Itzhak Levin,et al.  KDD-99 classifier learning contest LLSoft's results overview , 2000, SKDD.

[20]  Noorhaniza Wahid,et al.  A hybrid network intrusion detection system using simplified swarm optimization (SSO) , 2012, Appl. Soft Comput..

[21]  Kamel Mohamed Faraoun,et al.  Neural Networks Learning Improvement using the K-Means Clustering Algorithm to Detect Network Intrusions , 2007 .

[22]  Lin Yao,et al.  A Low Complexity Intrusion Detection Algorithm , 2007, International Conference on Computational Science.