Towards Near-Real-Time Intrusion Detection for IoT Devices using Supervised Learning and Apache Spark

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.

[1]  Cuong Pham-Quoc,et al.  An Efficient High-Throughput and Low-Latency SYN Flood Defender for High-Speed Networks , 2018, Secur. Commun. Networks.

[2]  Erhan Guven,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2016, IEEE Communications Surveys & Tutorials.

[3]  Petar Radanliev,et al.  Economic impact of IoT cyber risk - Analysing past and present to predict the future developments in IoT risk analysis and IoT cyber insurance , 2018, IoT 2018.

[4]  Francesco Palmieri,et al.  An uncertainty-managing batch relevance-based approach to network anomaly detection , 2015, Appl. Soft Comput..

[5]  Liang Xiao,et al.  IoT Security Techniques Based on Machine Learning: How Do IoT Devices Use AI to Enhance Security? , 2018, IEEE Signal Processing Magazine.

[6]  Michel Dagenais,et al.  A deep learning approach for proactive multi-cloud cooperative intrusion detection system , 2019, Future Gener. Comput. Syst..

[7]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  M. M. A. Hashem,et al.  Attack and anomaly detection in IoT sensors in IoT sites using machine learning approaches , 2019, Internet Things.

[10]  In Lee,et al.  The Internet of Things (IoT): Applications, investments, and challenges for enterprises , 2015 .

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Mahsa Nooribakhsh,et al.  A review on statistical approaches for anomaly detection in DDoS attacks , 2020, Inf. Secur. J. A Glob. Perspect..

[13]  Suad Mohammed Othman,et al.  Intrusion detection model using machine learning algorithm on Big Data environment , 2018, Journal of Big Data.

[14]  Govind P. Gupta,et al.  A Framework for Fast and Efficient Cyber Security Network Intrusion Detection Using Apache Spark , 2016 .

[15]  Salah El Hadaj,et al.  Performance evaluation of intrusion detection based on machine learning using Apache Spark , 2018 .

[16]  Farah Jemili,et al.  Comparative Study between Big Data Analysis Techniques in Intrusion Detection , 2018, Big Data Cogn. Comput..

[17]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[18]  Reynold Xin,et al.  Apache Spark , 2016 .

[19]  Gabriel Maciá-Fernández,et al.  Anomaly-based network intrusion detection: Techniques, systems and challenges , 2009, Comput. Secur..

[20]  Guangyi Liu,et al.  5G: Vision and Requirements for Mobile Communication System towards Year 2020 , 2016 .