Toward a reliable anomaly-based intrusion detection in real-world environments

Abstract A popular approach for detecting network intrusion attempts is to monitor the network traffic for anomalies. Extensive research effort has been invested in anomaly-based network intrusion detection using machine learning techniques; however, in general these techniques remain a research topic, rarely being used in real-world environments. In general, the approaches proposed in the literature lack representative datasets and reliable evaluation methods that consider real-world network properties during the system evaluation. In general, the approaches adopt a set of assumptions about the training data, as well as about the validation methods, rendering the created system unreliable for open-world usage. This paper presents a new method for creating intrusion databases. The objective is that the databases should be easy to update and reproduce with real and valid traffic, representative, and publicly available. Using our proposed method, we propose a new evaluation scheme specific to the machine learning intrusion detection field. Sixteen intrusion databases were created, and each of the assumptions frequently adopted in studies in the intrusion detection literature regarding network traffic behavior was validated. To make machine learning detection schemes feasible, we propose a new multi-objective feature selection method that considers real-world network properties. The results show that most of the assumptions frequently applied in studies in the literature do not hold when using a machine learning detection scheme for network-based intrusion detection. However, the proposed multi-objective feature selection method allows the system accuracy to be improved by considering real-world network properties during the model creation process.

[1]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[2]  Adam Prügel-Bennett,et al.  Data mining approaches for network intrusion detection: from dimensionality reduction to misuse and anomaly detection , 2012 .

[3]  Dong Seong Kim,et al.  Genetic algorithm to improve SVM based network intrusion detection system , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[4]  Luiz Eduardo Soares de Oliveira,et al.  Obtaining the threat model for e-mail phishing , 2013, Appl. Soft Comput..

[5]  Jugal K. Kalita,et al.  Towards Generating Real-life Datasets for Network Intrusion Detection , 2015, Int. J. Netw. Secur..

[6]  Carrie Gates,et al.  Challenging the anomaly detection paradigm: a provocative discussion , 2006, NSPW '06.

[7]  Zahid Anwar,et al.  A fast and scalable technique for constructing multicast routing trees with optimized quality of service using a firefly based genetic algorithm , 2014, Multimedia Tools and Applications.

[8]  Kim-Kwang Raymond Choo,et al.  User profiling in intrusion detection: A review , 2016, J. Netw. Comput. Appl..

[9]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[10]  Andrew Meneely,et al.  Are Intrusion Detection Studies Evaluated Consistently? A Systematic Literature Review , 2016 .

[11]  Kristopher Kendall,et al.  A Database of Computer Attacks for the Evaluation of Intrusion Detection Systems , 1999 .

[12]  Antonio Martínez-Álvarez,et al.  Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps , 2014, Knowl. Based Syst..

[13]  Yao Zheng,et al.  DDoS Attack Protection in the Era of Cloud Computing and Software-Defined Networking , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[14]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[15]  Habib M. Ammari,et al.  On the energy-delay trade-off in geographic forwarding in always-on wireless sensor networks: A multi-objective optimization problem , 2013, Comput. Networks.

[16]  Yang Liu,et al.  Collaborative Security , 2015, ACM Comput. Surv..

[17]  Verónica Bolón-Canedo,et al.  Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset , 2011, Expert Syst. Appl..

[18]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[19]  Sally Floyd,et al.  Wide-area traffic: the failure of Poisson modeling , 1994 .

[20]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[21]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[22]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[23]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[24]  David Darmon,et al.  Understanding the Predictive Power of Computational Mechanics and Echo State Networks in Social Media , 2013, 1306.6111.

[25]  Fang Liu,et al.  Characterizing User Behavior in Mobile Internet , 2015, IEEE Transactions on Emerging Topics in Computing.

[26]  Lajos Hanzo,et al.  A Survey of Multi-Objective Optimization in Wireless Sensor Networks: Metrics, Algorithms, and Open Problems , 2016, IEEE Communications Surveys & Tutorials.

[27]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[28]  Xiangjian He,et al.  RePIDS: A multi tier Real-time Payload-based Intrusion Detection System , 2013, Comput. Networks.

[29]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[30]  Luiz Eduardo Soares de Oliveira,et al.  A Methodology for Feature Selection Using Multiobjective Genetic Algorithms for Handwritten Digit String Recognition , 2003, Int. J. Pattern Recognit. Artif. Intell..

[31]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[32]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[33]  B. Muthukumar,et al.  Intrusion Detection System (IDS): Anomaly Detection Using Outlier Detection Approach , 2015 .

[34]  Lorie M. Liebrock,et al.  Authentication graphs: Analyzing user behavior within an enterprise network , 2015, Comput. Secur..

[35]  Samuel Kounev,et al.  Evaluating Computer Intrusion Detection Systems , 2015, ACM Comput. Surv..

[36]  Haruna Chiroma,et al.  A Review of the Advances in Cyber Security Benchmark Datasets for Evaluating Data-Driven Based Intrusion Detection Systems , 2015, SCSE.

[37]  Dong Hyun Jeong,et al.  A multi-level intrusion detection method for abnormal network behaviors , 2016, J. Netw. Comput. Appl..

[38]  Vincent Nicomette,et al.  Automated Evaluation of Network Intrusion Detection Systems in IaaS Clouds , 2015, 2015 11th European Dependable Computing Conference (EDCC).

[39]  Md. Abu Naser Bikas,et al.  An Implementation of Intrusion Detection System Using Genetic Algorithm , 2012, ArXiv.

[40]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[41]  Luiz Eduardo Soares de Oliveira,et al.  Towards an Energy-Efficient Anomaly-Based Intrusion Detection Engine for Embedded Systems , 2017, IEEE Transactions on Computers.

[42]  S. Janakiraman,et al.  An Intelligent Distributed Intrusion Detection System using Genetic Algorithm , 2009, J. Convergence Inf. Technol..

[43]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[44]  Peter Mountney,et al.  Learning without Labeling: Domain Adaptation for Ultrasound Transducer Localization , 2013, MICCAI.