Federated Learning for Malware Detection in IoT Devices

Billions of IoT devices lacking proper security mechanisms have been manufactured and deployed for the last years. Their vulnerability to malware has motivated the need for efficient techniques to detect infected IoT devices inside networks. With privacy becoming a major concern in recent years, a new technology called federated learning emerged. It allows training machine learning models with decentralized data while preserving its privacy by design. This work investigates the possibilities enabled by federated learning concerning IoT malware detection and studies security issues inherent to this new learning paradigm. In this context, a framework that uses federated learning to detect malware affecting IoT devices is presented. N-BaIoT, a dataset modeling network traffic of several real IoT devices while affected by malware, has been used to evaluate the proposed framework. Both supervised and unsupervised federated models (multi-layer perceptron and autoencoder) able to detect malware affecting seen and unseen IoT devices of N-BaIoT have been trained and evaluated. Furthermore, their performance has been compared to two traditional approaches. The first one lets each participant locally train a model using only its own data, while the second consists of making the participants share their data with a central entity in charge of training a global model. This comparison has shown that the use of more diverse and large data, as done in the federated and centralized methods, has a considerable positive impact on the model performance. Besides, the federated models, while preserving the participant’s privacy, show similar results as the centralized ones. As an additional contribution and to measure the robustness of the federated approach, an adversarial setup with several malicious participants poisoning the federated model has been considered. The baseline model aggregation averaging step used in most federated learning algorithms appears highly vulnerable to different attacks, even with a single adversary. The performance of other model aggregation functions acting as countermeasures is thus evaluated under the same attack scenarios. These functions provide a significant improvement against malicious participants, but more efforts are still needed to make federated approaches robust.

[1]  Hossam Faris,et al.  Unsupervised intelligent system based on one class support vector machine and Grey Wolf optimization for IoT botnet detection , 2019, Journal of Ambient Intelligence and Humanized Computing.

[2]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[3]  Tarik Taleb,et al.  The Road beyond 5G: A Vision and Insight of the Key Technologies , 2020, IEEE Network.

[4]  María José Erquiaga,et al.  IoT-23: A labeled dataset with malicious and benign IoT network traffic , 2020 .

[5]  Gregorio Martínez Pérez,et al.  A Survey on Device Behavior Fingerprinting: Data Sources, Techniques, Application Scenarios, and Datasets , 2021, IEEE Communications Surveys & Tutorials.

[6]  David Mohaisen,et al.  Exploring the Attack Surface of Blockchain: A Comprehensive Survey , 2020, IEEE Communications Surveys & Tutorials.

[7]  B. B. Gupta,et al.  Security in Internet of Things: issues, challenges, taxonomy, and architecture , 2017, Telecommunication Systems.

[8]  Sven Nomm,et al.  Dimensionality Reduction for Machine Learning Based IoT Botnet Detection , 2018, 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[9]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[10]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[11]  Chryssis Georgiou,et al.  Applying the dynamics of evolution to achieve reliability in master–worker computing , 2013, Concurr. Comput. Pract. Exp..

[12]  Dusit Niyato,et al.  Communication-Efficient Federated Learning for Anomaly Detection in Industrial Internet of Things , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[13]  Teng Huang,et al.  A dynamic and hierarchical access control for IoT in multi-authority cloud storage , 2020, J. Netw. Comput. Appl..

[14]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[15]  Martin Jaggi,et al.  Byzantine-Robust Learning on Heterogeneous Datasets via Resampling , 2020, ArXiv.

[16]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[17]  Sven Nomm,et al.  Unsupervised Anomaly Based Botnet Detection in IoT Networks , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[18]  Alagan Anpalagan,et al.  Empowering the Edge Intelligence by Air-Ground Integrated Federated Learning in 6G Networks , 2020, ArXiv.

[19]  Sven Nomm,et al.  MedBIoT: Generation of an IoT Botnet Dataset in a Medium-sized IoT Network , 2020, ICISSP.

[20]  Zahir Tari,et al.  TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems , 2020, IEEE Access.

[21]  Yuval Elovici,et al.  Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection , 2018, NDSS.

[22]  Elena Sitnikova,et al.  Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset , 2018, Future Gener. Comput. Syst..

[23]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[24]  Rahim Tafazolli,et al.  Fed-IIoT: A Robust Federated Malware Detection Architecture in Industrial IoT , 2020, IEEE Transactions on Industrial Informatics.

[25]  Hesham F. A. Hamed,et al.  Intrusion detection systems for IoT-based smart environments: a survey , 2018, Journal of Cloud Computing.

[26]  Prateek Saxena,et al.  Auror: defending against poisoning attacks in collaborative deep learning systems , 2016, ACSAC.

[27]  Wouter Joosen,et al.  Chained Anomaly Detection Models for Federated Learning: An Intrusion Detection Case Study , 2018, Applied Sciences.

[28]  Dusit Niyato,et al.  Federated learning for 6G communications: Challenges, methods, and future directions , 2020, China Communications.

[29]  Eryk Dutkiewicz,et al.  Collaborative Learning Model for Cyberattack Detection Systems in IoT Industry 4.0 , 2020, 2020 IEEE Wireless Communications and Networking Conference (WCNC).

[30]  Philip S. Yu,et al.  Privacy and Robustness in Federated Learning: Attacks and Defenses , 2020, IEEE transactions on neural networks and learning systems.

[31]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[32]  Teng Joon Lim,et al.  EDIMA: Early Detection of IoT Malware Network Activity Using Machine Learning Techniques , 2019, 2019 IEEE 5th World Forum on Internet of Things (WF-IoT).

[33]  Nei Kato,et al.  HCP: Heterogeneous Computing Platform for Federated Learning Based Collaborative Content Caching Towards 6G Networks , 2022, IEEE Transactions on Emerging Topics in Computing.

[34]  Sotiris Ioannidis,et al.  Botnet Attack Detection at the IoT Edge Based on Sparse Representation , 2019, 2019 Global IoT Summit (GIoTS).

[35]  Brij B. Gupta,et al.  IoT-Based Big Data Secure Management in the Fog Over a 6G Wireless Network , 2021, IEEE Internet of Things Journal.

[36]  Yuval Elovici,et al.  N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders , 2018, IEEE Pervasive Computing.

[37]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[38]  Rakesh Kumar Jha,et al.  A Survey on Beyond 5G Network With the Advent of 6G: Architecture and Emerging Technologies , 2020, IEEE Access.

[39]  Brij B. Gupta,et al.  IoT transaction processing through cooperative concurrency control on fog–cloud computing environment , 2019, Soft Computing.

[40]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[41]  Theophilus A. Benson,et al.  Detecting Volumetric Attacks on loT Devices via SDN-Based Monitoring of MUD Activity , 2019, SOSR.

[42]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[43]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.