Federated Forest

Most real-world data are scattered across different companies or government organizations, and cannot be easily integrated under data privacy and related regulations such as the European Union's General Data Protection Regulation (GDPR) and China' Cyber Security Law. Such data islands situation and data privacy & security are two major challenges for applications of artificial intelligence. In this paper, we tackle these challenges and propose a privacy-preserving machine learning model, called Federated Forest, which is a lossless learning model of the traditional random forest method, i.e., achieving the same level of accuracy as the non-privacy-preserving approach. Based on it, we developed a secure cross-regional machine learning system that allows a learning process to be jointly trained over different regions' clients with the same user samples but different attribute sets, processing the data stored in each of them without exchanging their raw data. A novel prediction algorithm was also proposed which could largely reduce the communication overhead. Experiments on both real-world and UCI data sets demonstrate the performance of the Federated Forest is as accurate as the non-federated version. The efficiency and robustness of our proposed system had been verified. Overall, our model is practical, scalable and extensible for real-life tasks.

[1]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[2]  Tianjian Chen,et al.  A Secure Federated Transfer Learning Framework , 2020, IEEE Intelligent Systems.

[3]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[6]  Sangwook Kim,et al.  Privacy-Preserving Naive Bayes Classification Using Fully Homomorphic Encryption , 2018, ICONIP.

[7]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[8]  Zhenguo Li,et al.  Federated Meta-Learning for Recommendation , 2018, ArXiv.

[9]  Yang Liu,et al.  Secure Federated Transfer Learning , 2018, ArXiv.

[10]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[11]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[12]  Hesham El-Rewini,et al.  Message Passing Interface (MPI) , 2005 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Aysegul Gunduz,et al.  A comparative analysis of speech signal processing algorithms for Parkinson's disease classification and the use of the tunable Q-factor wavelet transform , 2019, Appl. Soft Comput..

[15]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[16]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[17]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[18]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[19]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[20]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[21]  Richard Nock,et al.  Entity Resolution and Federated Learning get a Federated Resolution , 2018, ArXiv.

[22]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[23]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[24]  Shiho Moriai,et al.  Privacy-Preserving Deep Learning via Additively Homomorphic Encryption , 2018, IEEE Transactions on Information Forensics and Security.

[25]  Shenjian Chen,et al.  Message Passing Interface (MPI) , 2011, Encyclopedia of Parallel Computing.

[26]  Hao Deng,et al.  LoAdaBoost: Loss-Based AdaBoost Federated Machine Learning on medical Data , 2018, ArXiv.

[27]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.