Salvaging Federated Learning by Local Adaptation

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the federated model, latest FL approaches use differential privacy or robust aggregation to limit the influence of "outlier" participants. First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each technique is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model and who have no incentive to participate in FL today improve less, but sufficiently to make the adapted federated model better than their local models.

[1]  Kai Yu,et al.  Cluster adaptive training for deep neural network , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Khe Chai Sim,et al.  Factorized Hidden Layer Adaptation for Deep Neural Network Based Acoustic Modeling , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[3]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[4]  Virendra J. Marathe,et al.  Private Federated Learning with Domain Adaptation , 2019, ArXiv.

[5]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[6]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings , 2017, Proc. ACM Meas. Anal. Comput. Syst..

[7]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.

[8]  David Rolnick,et al.  How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.

[9]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[10]  Hubert Eichner,et al.  Federated Evaluation of On-device Personalization , 2019, ArXiv.

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[13]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[14]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[15]  Hongyi Wang,et al.  DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation , 2019, NeurIPS.

[16]  Rachid Guerraoui,et al.  AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation , 2019, SysML.

[17]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[18]  Sreeram Kannan,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[19]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[20]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[21]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[22]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[23]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[24]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Florian Metze,et al.  Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[28]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[29]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[30]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[31]  Ameet Talwalkar,et al.  One-Shot Federated Learning , 2019, ArXiv.

[32]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Ji Wu,et al.  Rapid adaptation for deep neural networks through multi-task learning , 2015, INTERSPEECH.

[35]  Dietrich Klakow,et al.  Adversarial Initialization - when your network performs the way I want , 2019, ArXiv.

[36]  On the security relevance of weights in deep learning. , 2020 .

[37]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[38]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[39]  Zhiwei Steven Wu,et al.  Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms , 2019, NeurIPS.

[40]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[41]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[42]  Dong Yu,et al.  Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.

[43]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[44]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.