Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization

With exponential growth in the size of computer networks and developed applications, the significant increasing of the potential damage that can be caused by launching attacks is becoming obvious. Meanwhile, Intrusion Detection Systems (IDSs) and Intrusion Prevention Systems (IPSs) are one of the most important defense tools against the sophisticated and ever-growing network attacks. Due to the lack of adequate dataset, anomaly-based approaches in intrusion detection systems are suffering from accurate deployment, analysis and evaluation. There exist a number of such datasets such as DARPA98, KDD99, ISC2012, and ADFA13 that have been used by the researchers to evaluate the performance of their proposed intrusion detection and intrusion prevention approaches. Based on our study over eleven available datasets since 1998, many such datasets are out of date and unreliable to use. Some of these datasets suffer from lack of traffic diversity and volumes, some of them do not cover the variety of attacks, while others anonymized packet information and payload which cannot reflect the current trends, or they lack feature set and metadata. This paper produces a reliable dataset that contains benign and seven common attack network flows, which meets real world criteria and is publicly avaliable. Consequently, the paper evaluates the performance of a comprehensive set of network traffic features and machine learning algorithms to indicate the best set of features for detecting the certain attack categories.

[1]  Hirofumi Yamaki,et al.  Unknown Attacks Detection Using Feature Extraction from Anomaly-Based IDS Alerts , 2012, 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet.

[2]  Anil Somayaji,et al.  Analysis of the 1999 DARPA/Lincoln Laboratory IDS evaluation data with NetADHICT , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[3]  John Heidemann,et al.  Uses and Challenges for Network Datasets , 2009, 2009 Cybersecurity Applications & Technology Conference for Homeland Security.

[4]  Gregory J. Conti,et al.  Toward Instrumenting Network Warfare Competitions to Generate Labeled Datasets , 2009, CSET.

[5]  Joshua Ojo Nehinbe,et al.  A critical evaluation of datasets for investigating IDSs and IPSs researches , 2011, 2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS).

[6]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[7]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[8]  Hiroki Takakura,et al.  Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation , 2011, BADGERS '11.

[9]  Paul D. Scott,et al.  Evaluating data mining procedures: techniques for generating artificial data sets , 1999, Inf. Softw. Technol..

[10]  Joshua Ojo Nehinbe,et al.  A Simple Method for Improving Intrusion Detections in Corporate Networks , 2009, ISDF.

[11]  Ali A. Ghorbani,et al.  Towards a Reliable Intrusion Detection Benchmark Dataset , 2017 .

[12]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[13]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[14]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: A preliminary analysis of ADFA-LD , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[15]  Ali A. Ghorbani,et al.  Characterization of Tor Traffic using Time based Features , 2017, ICISSP.

[16]  Brian Neil Levine,et al.  Forensic investigation of the OneSwarm anonymous filesharing system , 2011, CCS '11.

[17]  Ali A. Ghorbani,et al.  Network Intrusion Detection and Prevention - Concepts and Techniques , 2010, Advances in Information Security.

[18]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: Application of the one-class SVM algorithm to ADFA-LD , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[19]  Aiko Pras,et al.  A Labeled Data Set for Flow-Based Intrusion Detection , 2009, IPOM.

[20]  Ali A. Ghorbani,et al.  An Evaluation Framework for Intrusion Detection Dataset , 2016, 2016 International Conference on Information Science and Security (ICISS).

[21]  Huang Chuanhe,et al.  Anomaly Based Intrusion Detection Using Hybrid Learning Approach of Combining k-Medoids Clustering and Naïve Bayes Classification , 2012, 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing.

[22]  Elliot Parker Proebstel,et al.  Characterizing and Improving Distributed Network-based Intrusion Detection Systems (NIDS): Timestamp Synchronization and Sampled Traffic , 2008 .