Continual Local Training For Better Initialization Of Federated Models

Federated learning (FL) refers to the learning paradigm that trains machine learning models directly in the decentralized systems consisting of smart edge devices without transmitting the raw data, which avoids the heavy communication costs and privacy concerns. Given the typical heterogeneous data distributions in such situations, the popular FL algorithm Federated Averaging (FedAvg) suffers from weight divergence and thus cannot achieve a competitive performance for the global model (denoted as the initial performance in FL) compared to centralized methods. In this paper, we propose the local continual training strategy to address this problem. Importance weights are evaluated on a small proxy dataset on the central server and then used to constrain the local training. With this additional term, we alleviate the weight divergence and continually integrate the knowledge on different local clients into the global model, which ensures a better generalization ability. Experiments on various FL settings demonstrate that our method significantly improves the initial performance of federated models with few extra communication costs.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[3]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[4]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[5]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[6]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[7]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[8]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[9]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[10]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[11]  Lifeng Sun,et al.  Two-Stream Federated Learning: Reduce the Communication Costs , 2018, 2018 IEEE Visual Communications and Image Processing (VCIP).

[12]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[13]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[15]  Yang Liu,et al.  Secure Federated Transfer Learning , 2018, ArXiv.

[16]  Lifeng Sun,et al.  Adversarial Feature Alignment: Avoid Catastrophic Forgetting in Incremental Task Lifelong Learning , 2019, Neural Computation.

[17]  Nadav Israel,et al.  Overcoming Forgetting in Federated Learning on Non-IID Data , 2019, ArXiv.

[18]  Xin Yao,et al.  Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating , 2019, ArXiv.

[19]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[20]  Lifeng Sun,et al.  Towards Faster and Better Federated Learning: A Feature Fusion Approach , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[21]  Tianjian Chen,et al.  A Secure Federated Transfer Learning Framework , 2020, IEEE Intelligent Systems.

[22]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.