A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015

Although KDD99 dataset is more than 15 years old, it is still widely used in academic research. To investigate wide usage of this dataset in Machine Learning Research (MLR) and Intrusion Detection Systems (IDS); this study reviews 149 research articles from 65 journals indexed in Science Citation In- dex Expanded and Emerging Sources Citation Index during the last six years (2010–2015). If we include papers presented in other indexes and conferences, number of studies would be tripled. The number of published studies shows that KDD99 is the most used dataset in IDS and machine learning areas, and it is the de facto dataset for these research areas. To show recent usage of KDD99 and the related sub-dataset (NSL-KDD) in IDS and MLR, the following de- scriptive statistics about the reviewed studies are given: main contribution of articles, the applied algorithms,

[1]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[2]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[3]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[4]  Bernhard Pfahringer,et al.  Winning the KDD99 classification cup: bagged boosting , 2000, SKDD.

[5]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[6]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[7]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[8]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[9]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[10]  Gürsel Serpen,et al.  Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set , 2004, Intell. Data Anal..

[11]  S. Selvakumar,et al.  SSENet-2011: A Network Intrusion Detection System dataset and its comparison with KDD CUP 99 dataset , 2011, 2011 Second Asian Himalayas International Conference on Internet (AH-ICI).

[12]  Georgios Kambourakis,et al.  Swarm intelligence in intrusion detection: A survey , 2011, Comput. Secur..

[13]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[14]  Robert K. Cunningham,et al.  Evaluating Intrusion Detection Systems Without Attacking Your Friends: The 1998 DARPA Intrusion Detection Evaluation , 1999 .

[15]  J. Chow An Assessment of the DARPA IDS Evaluation Dataset Using Snort S Terry Brugger , 2005 .

[16]  William H. Allen Mixing Wheat with the Chaff: Creating Useful Test Data for IDS Evaluation , 2007, IEEE Security & Privacy.

[17]  Feng Wang,et al.  A Survey of Artificial Immune System Based Intrusion Detection , 2014, TheScientificWorldJournal.

[18]  Arputharaj Kannan,et al.  Intelligent feature selection and classification techniques for intrusion detection in networks: a survey , 2013, EURASIP Journal on Wireless Communications and Networking.

[19]  Kristopher Kendall,et al.  A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems , 1999 .

[20]  Carlos García Garino,et al.  Automatic network intrusion detection: Current techniques and open issues , 2012, Comput. Electr. Eng..

[21]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[22]  Ruhul A. Sarker,et al.  Survey of Uses of Evolutionary Computation Algorithms and Swarm Intelligence for Network Intrusion Detection , 2015, Int. J. Comput. Intell. Appl..

[23]  Neminath Hubballi,et al.  False alarm minimization techniques in signature-based intrusion detection systems: A survey , 2014, Comput. Commun..