Non-IID data re-balancing at IoT edge with peer-to-peer federated learning for anomaly detection

The increase of the computational power in edge devices has enabled the penetration of distributed machine learning technologies such as federated learning, which allows to build collaborative models performing the training locally in the edge devices, improving the efficiency and the privacy for training of machine learning models, as the data remains in the edge devices. However, in some IoT networks the connectivity between devices and system components can be limited, which prevents the use of federated learning, as it requires a central node to orchestrate the training of the model. To sidestep this, peer-to-peer learning appears as a promising solution, as it does not require such an orchestrator. On the other side, the security challenges in IoT deployments have fostered the use of machine learning for attack and anomaly detection. In these problems, under supervised learning approaches, the training datasets are typically imbalanced, i.e. the number of anomalies is very small compared to the number of benign data points, which requires the use of re-balancing techniques to improve the algorithms' performance. In this paper, we propose a novel peer-to-peer algorithm,P2PK-SMOTE, to train supervised anomaly detection machine learning models in non-IID scenarios, including mechanisms to locally re-balance the training datasets via synthetic generation of data points from the minority class. To improve the performance in non-IID scenarios, we also include a mechanism for sharing a small fraction of synthetic data from the minority class across devices, aiming to reduce the risk of data de-identification. Our experimental evaluation in real datasets for IoT anomaly detection across a different set of scenarios validates the benefits of our proposed approach.

[1]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[2]  Masahiro Morikura,et al.  Hybrid-FL for Wireless Networks: Cooperative Learning Mechanism Using Non-IID Data , 2019, ICC 2020 - 2020 IEEE International Conference on Communications (ICC).

[3]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[4]  Eric Lin,et al.  FedFMC: Sequential Efficient Federated Learning on Non-iid Data , 2020, ArXiv.

[5]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[6]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[7]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[8]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[9]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[10]  Elena Sitnikova,et al.  Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset , 2018, Future Gener. Comput. Syst..

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Kenneth T. Co,et al.  Byzantine-Robust Federated Machine Learning through Adaptive Model Averaging , 2019, ArXiv.

[13]  Olac Fuentes,et al.  A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets , 2007, FLAIRS.

[14]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[15]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[16]  Nassir Navab,et al.  BrainTorrent: A Peer-to-Peer Environment for Decentralized Federated Learning , 2019, ArXiv.

[17]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[18]  Yuval Elovici,et al.  N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders , 2018, IEEE Pervasive Computing.

[19]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[20]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[22]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[23]  Moming Duan,et al.  Astraea: Self-Balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[24]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[25]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[26]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[27]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[28]  Sepideh Pashami,et al.  Decentralized and Adaptive K-Means Clustering for Non-IID Data Using HyperLogLog Counters , 2020, PAKDD.

[29]  Kin K. Leung,et al.  When Edge Meets Learning: Adaptive Control for Resource-Constrained Distributed Machine Learning , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.