A Light-Weight Crowdsourcing Aggregation in Privacy-Preserving Federated Learning System

Federated Machine Learning (FML) sheds light on secure distributed machine learning. However, generic FML methods may lead to privacy-leakage through the sharing of training information of individual models and have relatively poor performance when the training datasets for individual models are biased and diversified. This is a problem in combining models trained in different scenarios of IoT devices since the available training datasets are usually limited and biased. To tackle this problem, we propose a novel approach to precisely ensemble results from different models in distributed edge devices. Instead of passing the training information of individual models around that requires a relatively large amount of bandwidth and compromises data privacy, we suggest employing a trusted central agent that only collects different inference results from edge devices. Then based on a limited amount of labeled data, the agent runs a designed statistical iterative crowdsourcing algorithm to combine results for a more accurate aggregated prediction towards a user query. Our proposed system model, "Privacy-Preserving Federated Learning System", together with our light-weight Secure Crowdsourcing Aggregation (SC-Agg) algorithm, provide a more accurate prediction for outside queries at little cost without any prior knowledge of what query will be submitted. We experimentally verify that in our system, SC-Agg consistently outperforms the majority voting method and the best performing model of the ensemble in all testing scenarios. We believe that SC-Agg fits the real-world IoT applications better than other methods, such as the vanilla majority voting, for its robustness and better performance.

[1]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[2]  L. Myers,et al.  Spearman Correlation Coefficients, Differences between , 2004 .

[3]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[4]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[5]  Bin Bi,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2012 .

[6]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[7]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[8]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[9]  Qiang Yang,et al.  Cross-task crowdsourcing , 2013, KDD.

[10]  Philip S. Yu,et al.  Deep Learning towards Mobile Applications , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[11]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[12]  Devavrat Shah,et al.  Iterative Learning for Reliable Crowdsourcing Systems , 2011, NIPS.

[13]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[14]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[15]  Guosheng Lin,et al.  On lightweight privacy-preserving collaborative learning for internet-of-things objects , 2019, IoTDI.

[16]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[17]  Bart Selman,et al.  Noise Strategies for Improving Local Search , 1994, AAAI.