Data-Free Evaluation of User Contributions in Federated Learning

Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device’s private data and computing resources. A critical issues is to evaluate individual users’ contributions so that (1) users’ effort in model training can be compensated with proper incentives and (2) malicious and low-quality users can be detected and removed. The state-of-the-art solutions require a representative test dataset for the evaluation purpose, but such a dataset is often unavailable and hard to synthesize. In this paper, we propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset. PCA achieves this using the statistical correlation of the model parameters uploaded by users. We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness. We evaluate the performance of PCA and Fed-PCA using the MNIST dataset and a large industrial product recommendation dataset. The results demonstrate that our Fed-PCA outperforms the canonical FedAvg algorithm and other baseline methods in accuracy, and at the same time, PCA effectively incentivizes users to behave truthfully.

[1]  Arpit Agarwal,et al.  Informed Truthfulness in Multi-Task Peer Prediction , 2016, EC.

[2]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[3]  Han Yu,et al.  FedCoin: A Peer-to-Peer Payment System for Federated Learning , 2020, Federated Learning.

[4]  Zhixuan Fang,et al.  Incentive Mechanism Design for Federated Learning with Multi-Dimensional Private Information , 2020, WiOpt.

[5]  Eric Horvitz,et al.  Incentives for truthful reporting in crowdsourcing , 2012, AAMAS.

[6]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[7]  Paul Resnick,et al.  Eliciting Informative Feedback: The Peer-Prediction Method , 2005, Manag. Sci..

[8]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[9]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[10]  Shengli Xie,et al.  Incentive Mechanism for Reliable Federated Learning: A Joint Optimization Approach to Combining Reputation and Contract Theory , 2019, IEEE Internet of Things Journal.

[11]  Jinshuo Dong,et al.  Deep Learning with Gaussian Differential Privacy , 2020, Harvard data science review.

[12]  Hubert Eichner,et al.  APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS , 2018, ArXiv.

[13]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[14]  Arpit Agarwal,et al.  Peer Prediction with Heterogeneous Users , 2017, EC.

[15]  Ziye Zhou,et al.  Measure Contribution of Participants in Federated Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[16]  Xiaowen Chu,et al.  FMore: An Incentive Scheme of Multi-dimensional Auction for Federated Learning in MEC , 2020, ArXiv.

[17]  Min Du,et al.  Free-riders in Federated Learning: Attacks and Defenses , 2019, ArXiv.

[18]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[19]  Sajal K. Das,et al.  Improving IoT Data Quality in Mobile Crowd Sensing: A Cross Validation Approach , 2019, IEEE Internet of Things Journal.

[20]  Cong Wang,et al.  FedServing: A Federated Prediction Serving Framework Based on Incentive Mechanism , 2020, IEEE INFOCOM 2021 - IEEE Conference on Computer Communications.

[21]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[22]  Steve W. Piche,et al.  The selection of weight accuracies for Madalines , 1995, IEEE Trans. Neural Networks.

[23]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[24]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[25]  Costas J. Spanos,et al.  Towards Efficient Data Valuation Based on the Shapley Value , 2019, AISTATS.

[26]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[27]  Sammy Siu,et al.  Computing and Analyzing the Sensitivity of MLP Due to the Errors of the i.i.d. Inputs and Weights Based on CLT , 2010, IEEE Transactions on Neural Networks.

[28]  Hédi Soula,et al.  Spontaneous Dynamics of Asymmetric Random Recurrent Spiking Neural Networks , 2004, Neural Computation.

[29]  Yang Liu,et al.  Incentives for Federated Learning: a Hypothesis Elicitation Approach , 2020, ArXiv.

[30]  Anirban Dasgupta,et al.  Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[31]  Ness B. Shroff,et al.  Incentivizing Truthful Data Quality for Quality-Aware Mobile Data Crowdsourcing , 2018, MobiHoc.

[32]  Kenneth T. Co,et al.  Byzantine-Robust Federated Machine Learning through Adaptive Model Averaging , 2019, ArXiv.

[33]  Sara Bouchenak,et al.  An Exploratory Analysis on Users’ Contributions in Federated Learning , 2020, 2020 Second IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA).

[34]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[35]  Tianjian Chen,et al.  A Fairness-aware Incentive Scheme for Federated Learning , 2020, AIES.

[36]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .