Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally. Unlike the centralized training setting, the non-IID, imbalanced (statistical heterogeneity) and distribution shifted training data of FL is distributed in the federated network, which will increase the divergences between the local models and the global model, further degrading performance. In this paper, we propose a flexible clustered federated learning (CFL) framework named FlexCFL, in which we 1) group the training of clients based on the similarities between the clients’ optimization directions for lower training divergence; 2) implement an efficient newcomer device cold start mechanism for framework scalability and practicality; 3) flexibly migrate clients to meet the challenge of client-level data distribution shift. FlexCFL can achieve improvements by dividing joint optimization into groups of sub-optimization and can strike a balance between accuracy and communication efficiency in the distribution shift environment. The convergence and complexity are analyzed to demonstrate the efficiency of FlexCFL. We also evaluate FlexCFL on several open datasets and made comparisons with related CFL frameworks. The results show that FlexCFL can significantly improve absolute test accuracy by +10.6% on FEMNIST compared to FedAvg, +3.5% on FashionMNIST compared to FedProx, +8.4% on MNIST compared to FeSEM. The experiment results show that FlexCFL is also communication efficient in the distribution shift environment.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Virginia Smith,et al.  Ditto: Fair and Robust Federated Learning Through Personalization , 2020, ICML.

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[5]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[8]  LiuDuo,et al.  Self-Balancing Federated Learning With Global Imbalanced Data in Mobile Systems , 2021 .

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Christopher Briggs,et al.  Federated learning with hierarchical clustering of local updates to improve training on non-IID data , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[11]  Soham Sarkar,et al.  On Perfect Clustering of High Dimension, Low Sample Size Data , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  K. Ramchandran,et al.  An Efficient Framework for Clustered Federated Learning , 2020, IEEE Transactions on Information Theory.

[13]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[14]  Alexandros Nanopoulos,et al.  On the existence of obstinate results in vector space models , 2010, SIGIR.

[15]  Jun Zhang,et al.  Edge-Assisted Hierarchical Federated Learning with Non-IID Data , 2019, ArXiv.

[16]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[17]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[18]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[19]  Benjamin Recht,et al.  The Effect of Natural Distribution Shift on Question Answering Models , 2020, ICML.

[20]  Se-Young Yun,et al.  TornadoAggregate: Accurate and Scalable Federated Learning via the Ring-Based Architecture , 2020, ArXiv.

[21]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[23]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[25]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[26]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[27]  Moming Duan,et al.  Astraea: Self-Balancing Federated Learning for Improving Classification Accuracy of Mobile Deep Learning Applications , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[30]  Thomas Wiegand,et al.  On the Byzantine Robustness of Clustered Federated Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[32]  Azzam Mourad,et al.  FedMCCS: Multicriteria Client Selection Model for Optimal IoT Federated Learning , 2021, IEEE Internet of Things Journal.

[33]  Duo Liu,et al.  FedGroup: Efficient Clustered Federated Learning via Decomposed Data-Driven Measure , 2020, 2010.06870.

[34]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[35]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[36]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[37]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[38]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[39]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[40]  Qi Zhu,et al.  Addressing Class Imbalance in Federated Learning , 2020, AAAI.

[41]  Wojciech Samek,et al.  Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.