Salvaging Federated Learning by Local Adaptation

Federated learning (FL) is a heavily promoted approach for training ML models on sensitive data, e.g., text typed by users on their smartphones. FL is expressly designed for training on data that are unbalanced and non-iid across the participants. To ensure privacy and integrity of the federated model, latest FL approaches use differential privacy or robust aggregation to limit the influence of "outlier" participants. First, we show that on standard tasks such as next-word prediction, many participants gain no benefit from FL because the federated model is less accurate on their data than the models they can train locally on their own. Second, we show that differential privacy and robust aggregation make this problem worse by further destroying the accuracy of the federated model for many participants. Then, we evaluate three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. We analyze where each technique is applicable and demonstrate that all participants benefit from local adaptation. Participants whose local models are poor obtain big accuracy improvements over conventional FL. Participants whose local models are better than the federated model and who have no incentive to participate in FL today improve less, but sufficiently to make the adapted federated model better than their local models.

[1]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[2]  Ameet Talwalkar,et al.  One-Shot Federated Learning , 2019, ArXiv.

[3]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[4]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[5]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[8]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[9]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[10]  Hubert Eichner,et al.  Federated Evaluation of On-device Personalization , 2019, ArXiv.

[11]  On the security relevance of weights in deep learning. , 2020 .

[12]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[13]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[14]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[15]  Ji Wu,et al.  Rapid adaptation for deep neural networks through multi-task learning , 2015, INTERSPEECH.

[16]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[17]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  Khe Chai Sim,et al.  Factorized Hidden Layer Adaptation for Deep Neural Network Based Acoustic Modeling , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Florian Metze,et al.  Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[22]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[23]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[24]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.

[25]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[26]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[27]  Dong Yu,et al.  Recent progresses in deep learning based acoustic models , 2017, IEEE/CAA Journal of Automatica Sinica.

[28]  Virendra J. Marathe,et al.  Private Federated Learning with Domain Adaptation , 2019, ArXiv.

[29]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[30]  Rachid Guerraoui,et al.  Personalized and Private Peer-to-Peer Machine Learning , 2017, AISTATS.

[31]  Dietrich Klakow,et al.  Adversarial Initialization - when your network performs the way I want , 2019, ArXiv.

[32]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2019, PERV.

[33]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[34]  Hongyi Wang,et al.  DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation , 2019, NeurIPS.

[35]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[36]  David Rolnick,et al.  How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.

[37]  Sreeram Kannan,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[38]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[39]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[40]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[41]  Zhiwei Steven Wu,et al.  Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms , 2019, NeurIPS.

[42]  Kai Yu,et al.  Cluster adaptive training for deep neural network , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[44]  Rachid Guerraoui,et al.  AGGREGATHOR: Byzantine Machine Learning via Robust Gradient Aggregation , 2019, SysML.