The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set

ABSTRACT Over the last three decades, Network Intrusion Detection Systems (NIDSs), particularly, Anomaly Detection Systems (ADSs), have become more significant in detecting novel attacks than Signature Detection Systems (SDSs). Evaluating NIDSs using the existing benchmark data sets of KDD99 and NSLKDD does not reflect satisfactory results, due to three major issues: (1) their lack of modern low footprint attack styles, (2) their lack of modern normal traffic scenarios, and (3) a different distribution of training and testing sets. To address these issues, the UNSW-NB15 data set has recently been generated. This data set has nine types of the modern attacks fashions and new patterns of normal traffic, and it contains 49 attributes that comprise the flow based between hosts and the network packets inspection to discriminate between the observations, either normal or abnormal. In this paper, we demonstrate the complexity of the UNSW-NB15 data set in three aspects. First, the statistical analysis of the observations and the attributes are explained. Second, the examination of feature correlations is provided. Third, five existing classifiers are used to evaluate the complexity in terms of accuracy and false alarm rates (FARs) and then, the results are compared with the KDD99 data set. The experimental results show that UNSW-NB15 is more complex than KDD99 and is considered as a new benchmark data set for evaluating NIDSs.

[1]  Tai-Myoung Chung,et al.  Cyber military strategy for cyberspace superiority in cyber warfare , 2012, Proceedings Title: 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic (CyberSec).

[2]  Anup K. Ghosh,et al.  Detecting anomalous and unknown intrusions against programs , 1998, Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217).

[3]  D. Altman,et al.  Statistics notes: Calculating correlation coefficients with repeated observations: Part 1—correlation within subjects , 1995 .

[4]  Kasidit Wijitsopon,et al.  An evaluation of data mining classification models for network intrusion detection , 2014, 2014 Fourth International Conference on Digital Information and Communication Technology and its Applications (DICTAP).

[5]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[6]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[7]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[8]  Gary B. Wills,et al.  Unsupervised Clustering Approach for Network Anomaly Detection , 2012, NDT.

[9]  Manas Ranjan Patra,et al.  NETWORK INTRUSION DETECTION USING NAÏVE BAYES , 2007 .

[10]  Giovanni Vigna,et al.  NetSTAT: A Network-based Intrusion Detection System , 1999, J. Comput. Secur..

[11]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[12]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[13]  Aboul Ella Hassanien,et al.  Continuous Features Discretization for Anomaly Intrusion Detectors Generation , 2014, ArXiv.

[14]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[15]  D. Altman,et al.  Calculating correlation coefficients with repeated observations: Part 2--Correlation between subjects. , 1995, BMJ.

[16]  M. Vatis Cyber Attacks During the War on Terrorism: A Predictive Analysis , 2001 .

[17]  Yacine Bouzida,et al.  Neural networks vs . decision trees for intrusion detection , 2006 .

[18]  R. Zamar,et al.  A multivariate Kolmogorov-Smirnov test of goodness of fit , 1997 .

[19]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[20]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[21]  Bryan Krekel,et al.  Capability of the People's Republic of China to Conduct Cyber Warfare and Computer Network Exploitation , 2009 .

[22]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[23]  K. Mardia Measures of multivariate skewness and kurtosis with applications , 1970 .

[24]  Marjan Gusev,et al.  Architecture Of A Identity Based Firewall System , 2011, ArXiv.

[25]  Andrew H. Sung,et al.  Intrusion detection using an ensemble of intelligent paradigms , 2005, J. Netw. Comput. Appl..

[26]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[27]  David A. Cieslak,et al.  A framework for monitoring classifiers’ performance: when and why failure occurs? , 2009, Knowledge and Information Systems.

[28]  Mei-Ling Shyu,et al.  Handling nominal features in anomaly intrusion detection problems , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[29]  Elizabeth B. Lennon Testing Intrusion Detection Systems , 2003 .

[30]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[31]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[32]  Maher Salem,et al.  Mining Techniques in Network Security to Enhance Intrusion Detection Systems , 2012, ArXiv.

[33]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[34]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[35]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[36]  Jill Slay,et al.  The Significant Features of the UNSW-NB15 and the KDD99 Data Sets for Network Intrusion Detection Systems , 2015, 2015 4th International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).