Real-time Pattern Detection in IP Flow Data using Apache Spark

Detection of network attacks is a challenging task, especially concerning detection coverage and timeliness. The defenders need to be able to detect advanced types of attacks and minimize the time gap between the attack detection and its mitigation. To meet these requirements, we present a stream-based IP flow data processing application for real-time attack detection using similarity search techniques. Our approach extends capabilities of traditional detection systems and allows to detect not only anomalies and attacks that match exactly to predefined patterns but also their variations. The approach is demonstrated on detection of SSH authentication attacks. We describe a process of patterns definition and illustrate their usage in a real-world deployment. We show that our approach provides sufficient performance of IP flow data processing for real-time detection while maintaining versatility and ability to detect network attacks that have not been recognized by traditional approaches.

[1]  Sebastian Abt,et al.  Are We Missing Labels? A Study of the Availability of Ground-Truth in Network Security Research , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[2]  Radek Krejcí,et al.  Flow Information Storage Assessment Using IPFIXcol , 2012, AIMS.

[3]  Pavel Celeda,et al.  A performance benchmark for NetFlow data analysis on distributed stream processing systems , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[4]  Jan Vykopal,et al.  Network-Based Dictionary Attack Detection , 2009, 2009 International Conference on Future Networks.

[5]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[6]  Daniel Diaz-López,et al.  Open Source Search Analytics - Elasticsearch , 2018 .

[7]  Frank Breitinger,et al.  Availability of datasets for digital forensics - And what is missing , 2017, Digit. Investig..

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[10]  Pavel Celeda,et al.  Toward Stream-Based IP Flow Analysis , 2017, IEEE Communications Magazine.

[11]  Muhammad Sher,et al.  Flow-based intrusion detection: Techniques and challenges , 2017, Comput. Secur..

[12]  Brett J. Borghetti,et al.  A Survey of Distance and Similarity Measures Used Within Network Intrusion Anomaly Detection , 2015, IEEE Communications Surveys & Tutorials.

[13]  Joel J. P. C. Rodrigues,et al.  A comprehensive survey on network anomaly detection , 2018, Telecommunication Systems.

[14]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[15]  Jie Gu,et al.  An effective intrusion detection framework based on SVM with feature augmentation , 2017, Knowl. Based Syst..

[16]  Neminath Hubballi,et al.  OCPAD: One class Naive Bayes classifier for payload based anomaly detection , 2016, Expert Syst. Appl..

[17]  Vijay Varadharajan,et al.  An Enhanced Model for Network Flow Based Botnet Detection , 2015, ACSC.

[18]  Aiko Pras,et al.  Flow-Based Web Application Brute-Force Attack and Compromise Detection , 2017, Journal of Network and Systems Management.

[19]  Tomás Jirsík,et al.  Real-time analysis of NetFlow data for generating network traffic statistics using Apache Spark , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[20]  Tomás Jirsík,et al.  Towards Provable Network Traffic Measurement and Analysis via Semi-Labeled Trace Datasets , 2018, 2018 Network Traffic Measurement and Analysis Conference (TMA).