论文信息 - A detailed analysis of the KDD CUP 99 data set

A detailed analysis of the KDD CUP 99 data set

During the last decade, anomaly detection has attracted the attention of many researchers to overcome the weakness of signature-based IDSs in detecting novel attacks, and KDDCUP'99 is the mostly widely used data set for the evaluation of these systems. Having conducted a statistical analysis on this data set, we found two important issues which highly affects the performance of evaluated systems, and results in a very poor evaluation of anomaly detection approaches. To solve these issues, we have proposed a new data set, NSL-KDD, which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings.

[1] Salvatore J. Stolfo,et al. Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[2] John S. Baras,et al. A framework for the evaluation of intrusion detection systems , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[3] R.K. Cunningham,et al. Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[4] Giovanni Di Crescenzo,et al. Towards a Theory of Intrusion Detection , 2005, ESORICS.

[5] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[6] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[7] John McHugh,et al. Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[8] Guofei Gu,et al. Measuring intrusion detection capability: an information-theoretic approach , 2006, ASIACCS '06.

[9] John E. Gaffney,et al. Evaluation of intrusion detectors: a decision theory approach , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[10] Bruce W. Suter,et al. The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[11] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[12] Satinder Singh,et al. Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[13] Leonid Portnoy,et al. Intrusion detection with unlabeled data using clustering , 2000 .

[14] Philip K. Chan,et al. An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[15] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[16] David Aldous,et al. The Continuum Random Tree III , 1991 .

[17] Ron Kohavi,et al. Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[18] Carl E. Landwehr,et al. A taxonomy of computer program security flaws , 1993, CSUR.

[19] Pat Langley,et al. Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[20] M. Shyu,et al. A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[21] Stefan Axelsson,et al. The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.