Live Anomaly Detection based on Machine Learning Techniques SAD-F: Spark Based Anomaly Detection Framework

Anomaly detection is a crucial step for preventing malicious activities in the network and keeping resources available all the time for legitimate users. It is noticed from various studies that classical anomaly detectors work well with small and sampled data, but the chances of failures increase with real-time (non-sampled data) traffic data. In this paper, we will be exploring security analytic techniques for DDoS anomaly detection using different machine learning techniques. In this paper, we are proposing a novel approach which deals with real traffic as input to the system. Further, we study and compare the performance factor of our proposed framework on three different testbeds including normal commodity hardware, low-end system, and high-end system. Hardware details of testbeds are discussed in the respective section. Further in this paper, we investigate the performance of the classifiers in (near) real-time detection of anomalies attacks. This study also focused on the feature selection process that is as important for the anomaly detection process as it is for general modeling problems. Several techniques have been studied for feature selection and it is observed that proper feature selection can increase performance in terms of model's execution time - which totally depends upon the traffic file or traffic capturing process.

[1]  Adetunmbi A. Olusola,et al.  Analysis of KDD '99 Intrusion Detection Dataset for Selection of Relevance Features , 2010 .

[2]  Pedro Casas,et al.  Network security and anomaly detection with Big-DAMA, a big data analytics framework , 2017, 2017 IEEE 6th International Conference on Cloud Networking (CloudNet).

[3]  MusílekPetr,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006 .

[4]  Radu Velea,et al.  Feature Extraction and Visualization for Network PcapNg Traces , 2017, 2017 21st International Conference on Control Systems and Computer Science (CSCS).

[5]  Pedro Casas,et al.  Ensemble-learning Approaches for Network Security and Anomaly Detection , 2017, Big-DAMA@SIGCOMM.

[6]  Andrew J. Clark,et al.  Data preprocessing for anomaly based network intrusion detection: A review , 2011, Comput. Secur..

[7]  Pavel Celeda,et al.  A performance benchmark for NetFlow data analysis on distributed stream processing systems , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[8]  Salah El Hadaj,et al.  Performance evaluation of intrusion detection based on machine learning using Apache Spark , 2018 .

[9]  Seref Sagiroglu,et al.  Big data analytics for network anomaly detection from netflow data , 2017, 2017 International Conference on Computer Science and Engineering (UBMK).

[10]  Philippe Owezarski,et al.  Unsupervised Network Anomaly Detection in Real-Time on Big Data , 2015, ADBIS.

[11]  Kensuke Fukuda,et al.  MAWILab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking , 2010, CoNEXT.

[12]  Sufian Hameed,et al.  Efficacy of Live DDoS Detection with Hadoop , 2015, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[13]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[14]  Ahmad Y. Javaid,et al.  Distributed network traffic feature extraction for a real-time IDS , 2016, 2016 IEEE International Conference on Electro Information Technology (EIT).

[15]  A. Malathi,et al.  A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection , 2013 .

[16]  Kensuke Fukuda,et al.  Hashdoop: A MapReduce framework for network anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[17]  Marco Mellia,et al.  Big-DAMA: Big Data Analytics for Network Traffic Monitoring and Analysis , 2016, LANCOMM@SIGCOMM.

[18]  Andrew W. Moore,et al.  A Machine Learning Approach for Efficient Traffic Classification , 2007, 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[19]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[20]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[21]  Sufian Hameed,et al.  SDN Based Collaborative Scheme for Mitigation of DDoS Attacks , 2018, Future Internet.

[22]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[23]  Sufian Hameed,et al.  HADEC: Hadoop-based live DDoS detection framework , 2018, EURASIP J. Inf. Secur..

[24]  Philip K. Chan,et al.  PHAD: packet header anomaly detection for identifying hostile network traffic , 2001 .

[25]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[26]  Lukasz A. Kurgan,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006, The Knowledge Engineering Review.

[27]  Sufian Hameed,et al.  Leveraging SDN for collaborative DDoS mitigation , 2017, 2017 International Conference on Networked Systems (NetSys).