Cost-Sensitive Distributed Machine Learning for NetFlow-Based Botnet Activity Detection

The recent advancements of malevolent techniques have caused a situation where the traditional signature-based approach to cyberattack detection is rendered ineffective. Currently, new, improved, potent solutions incorporating Big Data technologies, effective distributed machine learning, and algorithms countering data imbalance problem are needed. Therefore, the major contribution of this paper is the proposal of the cost-sensitive distributed machine learning approach for cybersecurity. In particular, we proposed to use and implemented cost-sensitive distributed machine learning by means of distributed Extreme Learning Machines (ELM), distributed Random Forest, and Distributed Random Boosted-Trees to detect botnets. The system’s concept and architecture are based on the Big Data processing framework with data mining and machine learning techniques. In practical terms in this paper, as a use case, we consider the problem of botnet detection by means of analysing the data in form of NetFlows. The reported results are promising and show that the proposed system can be considered as a useful tool for the improvement of cybersecurity.

[1]  Jiankun Hu,et al.  A Real-Time NetFlow-based Intrusion Detection System with Improved BBNN and High-Frequency Field Programmable Gate Arrays , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[2]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Johannes R. Sveinsson,et al.  Random forest classifiers for hyperspectral data , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[4]  Enda Fallon,et al.  Self-configuring NetFlow anomaly detection using cluster density analysis , 2017, 2017 19th International Conference on Advanced Communication Technology (ICACT).

[5]  Michal Choras,et al.  Extreme Learning Machines for Web Layer Anomaly Detection , 2016, IP&C.

[6]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[7]  Xiaoyong Yuan PhD Forum: Deep Learning-Based Real-Time Malware Detection with Multi-Stage Analysis , 2017, 2017 IEEE International Conference on Smart Computing (SMARTCOMP).

[8]  Martin Rehák,et al.  Detecting DGA malware using NetFlow , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[9]  Michal Choras,et al.  Pattern Extraction Algorithm for NetFlow-Based Botnet Activities Detection , 2017, Secur. Commun. Networks.

[10]  Michal Choras,et al.  Solution to Data Imbalance Problem in Application Layer Anomaly Detection Systems , 2016, HAIS.

[11]  Rafal Kozik Distributing extreme learning machines with Apache Spark for NetFlow-based malware activity detection , 2018, Pattern Recognit. Lett..

[12]  Sebastian Abt,et al.  Towards Efficient and Privacy-Preserving Network-Based Botnet Detection Using Netflow Data , 2012, INC.

[13]  Michal Choras,et al.  Correlation Approach for SQL Injection Attacks Detection , 2012, CISIS/ICEUTE/SOCO Special Sessions.

[14]  Michal Choras,et al.  Sparse Autoencoders for Unsupervised Netflow Data Classification , 2018, IP&C.