A detailed analysis of the KDD CUP 99 data set

During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of these systems. Having conducted a statistical analysis on this data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

[1]  Salvatore J. Stolfo,et al.  Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[2]  John S. Baras,et al.  A framework for the evaluation of intrusion detection systems , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[3]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[4]  Giovanni Di Crescenzo,et al.  Towards a Theory of Intrusion Detection , 2005, ESORICS.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[8]  Guofei Gu,et al.  Measuring intrusion detection capability: an information-theoretic approach , 2006, ASIACCS '06.

[9]  John E. Gaffney,et al.  Evaluation of intrusion detectors: a decision theory approach , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Satinder Singh,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[13]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[14]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[15]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[16]  David Aldous,et al.  The Continuum Random Tree III , 1991 .

[17]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[18]  Carl E. Landwehr,et al.  A taxonomy of computer program security flaws , 1993, CSUR.

[19]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[20]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[21]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.