DFedForest: Decentralized Federated Forest

The effectiveness of machine learning systems depends heavily on the relevance of the training data. Usually, the collected data is sensitive and private because it comes from devices and sensors used in people’s daily lives. The General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in California, and China’s Cybersecurity Law put the current approach at risk, as it prohibits centralized remote processing of sensitive data collected in a distributed manner. This paper proposes a distributed machine learning system based on local random forest algorithms created with shared decision trees through the blockchain. The results show that the proposed approach equals or exceeds the results obtained with the use of random forests with only local data. Furthermore, the proposal increases the detection of new attacks when the domains have different threat distributions.

[1]  Otto Carlos Muniz Bandeira Duarte,et al.  An elastic intrusion detection system for software networks , 2016, Ann. des Télécommunications.

[2]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.

[3]  Reza M. Parizi,et al.  Blockchain-Based Certification for Education, Employment, and Skill with Incentive Mechanism , 2020, Blockchain Cybersecurity, Trust and Privacy.

[4]  Raouf Boutaba,et al.  BotChase: Graph-Based Bot Detection Using Machine Learning , 2020, IEEE Transactions on Network and Service Management.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Zibin Zheng,et al.  A Blockchain-Based Decentralized Federated Learning Framework with Committee Consensus , 2020, IEEE Network.

[7]  Sylvio Barbon Junior,et al.  Providing IoT host-based datasets for intrusion detection research ∗ , 2018 .

[8]  David Gabay,et al.  Privacy-Preserving Authentication Scheme for Connected Electric Vehicles Using Blockchain and Zero Knowledge Proofs , 2020, IEEE Transactions on Vehicular Technology.

[9]  Basit Shafiq,et al.  A Random Decision Tree Framework for Privacy-Preserving Data Mining , 2014, IEEE Transactions on Dependable and Secure Computing.

[10]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[11]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hyper-Parameter Tuning of a Decision Tree Induction Algorithm , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[12]  István Hegedüs,et al.  Gossip Learning as a Decentralized Alternative to Federated Learning , 2019, DAIS.

[13]  Tiffany Hyun-Jin Kim,et al.  SSP: Self-Sovereign Privacy for Internet of Things Using Blockchain and MPC , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[14]  Otto Carlos Muniz Bandeira Duarte,et al.  Providing a Sliced, Secure, and Isolated Software Infrastructure of Virtual Functions Through Blockchain Technology , 2019, 2019 IEEE 20th International Conference on High Performance Switching and Routing (HPSR).

[15]  Svetha Venkatesh,et al.  Differentially Private Random Forest with High Utility , 2015, 2015 IEEE International Conference on Data Mining.

[16]  Marko Vukolic,et al.  Hyperledger fabric: a distributed operating system for permissioned blockchains , 2018, EuroSys.

[17]  Wenchao Huang,et al.  FLChain: A Blockchain for Auditable Federated Learning with Trust and Incentive , 2019, 2019 5th International Conference on Big Data Computing and Communications (BIGCOM).

[18]  Otto Carlos Muniz Bandeira Duarte,et al.  TeMIA-NT: ThrEat Monitoring and Intelligent data Analytics of Network Traffic , 2020, 2020 4th Conference on Cloud and Internet of Things (CIoT).

[19]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[20]  Otto Carlos M. B. Duarte,et al.  AutAvailChain: Automatic and Secure Data Availability through Blockchain , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[21]  Michele Nogueira Lima,et al.  A Self-Adaptable System for DDoS Attack Prediction Based on the Metastability Theory , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[22]  Samuel Marchal,et al.  DÏoT: A Federated Self-learning Anomaly Detection System for IoT , 2018, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[23]  Leonardo Babun,et al.  Detection of Compromised Smart Grid Devices with Machine Learning and Convolution Techniques , 2018, 2018 IEEE International Conference on Communications (ICC).

[24]  Stephen Lee,et al.  FastFabric: Scaling Hyperledger Fabric to 20,000 Transactions per Second , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[25]  Salil S. Kanhere,et al.  SpeedyChain: A framework for decoupling data from blockchain for smart cities , 2018, MobiQuitous.

[26]  Reza M. Parizi,et al.  Federated Learning: A Survey on Enabling Technologies, Protocols, and Applications , 2020, IEEE Access.

[27]  Otto Carlos Muniz Bandeira Duarte,et al.  Somewhat homomorphic encryption scheme for arithmetic operations on large integers , 2012, 2012 Global Information Infrastructure and Networking Symposium (GIIS).

[28]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[29]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[30]  Dong In Kim,et al.  Toward an Automated Auction Framework for Wireless Federated Learning Services Market , 2019, ArXiv.

[31]  Khaled Salah,et al.  Blockchain Technology for Smart Grids: Decentralized NIST Conceptual Model , 2020, IEEE Access.

[32]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[33]  Alejandro Zunino,et al.  An empirical comparison of botnet detection methods , 2014, Comput. Secur..

[34]  Yang Liu,et al.  Federated Forest , 2019, ArXiv.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Kenli Li,et al.  A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment , 2017, IEEE Transactions on Parallel and Distributed Systems.

[37]  Rongxing Lu,et al.  Scalable Privacy-Preserving Query Processing over Ethereum Blockchain , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[38]  Sanjay Madria,et al.  A Permissioned Blockchain Based Access Control System for IOT , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).