Federated Continual Learning with Adaptive Parameter Communication

There has been a surge of interest in continual learning and federated learning, both of which are important in training deep neural networks in real-world scenarios. Yet little research has been done regarding the scenario where each client learns on a sequence of tasks from private local data. This problem of federated continual learning poses new challenges to continual learning, such as utilizing knowledge and preventing interference from tasks learned on other clients. To resolve these issues, we propose a novel federated continual learning framework, Federated continual learning with Adaptive Parameter Communication, which additively decomposes the network weights into global shared parameters and sparse task-specific parameters. This decomposition allows to minimize interference between incompatible tasks, and also allows inter-client knowledge transfer by communicating the sparse task-specific parameters. Our federated continual learning framework is also communication-efficient, due to high sparsity of the parameters and sparse parameter update. We validate APC against existing federated learning and local continual learning methods under varying degrees of task similarity across clients, and show that our model significantly outperforms them with a large reduction in the communication cost.

[1]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[2]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[3]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[4]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[6]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[7]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[8]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[9]  Nadav Israel,et al.  Overcoming Forgetting in Federated Learning on Non-IID Data , 2019, ArXiv.

[10]  Xiaoyan Sun,et al.  Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[13]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[16]  Martin Slawski,et al.  Asynchronous Online Federated Learning for Edge Devices with Non-IID Data , 2019, 2020 IEEE International Conference on Big Data (Big Data).

[17]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[18]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[19]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[20]  Johannes Stallkamp,et al.  The German Traffic Sign Recognition Benchmark: A multi-class classification competition , 2011, The 2011 International Joint Conference on Neural Networks.

[21]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[22]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[23]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[24]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[25]  Mehrdad Mahdavi,et al.  Adaptive Personalized Federated Learning , 2020, ArXiv.

[26]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[27]  Huzefa Rangwala,et al.  Asynchronous Online Federated Learning for Edge Devices , 2019, ArXiv.

[28]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[29]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[30]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[31]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[32]  Xiaoyan Sun,et al.  Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[34]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.