FedH2L: Federated Learning with Model and Statistical Heterogeneity

Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. Mainstream FL approaches require each participant to share a common network architecture and further assume that data are are sampled IID across participants. However, in real-world deployments participants may require heterogeneous network architectures; and the data distribution is almost certainly non-uniform. To address these issues we introduce FedH2L, which is agnostic to the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner. This makes it extremely bandwidth efficient, model agnostic, and crucially produces models capable of performing well on the whole data distribution when learning from heterogeneous silos.

[1]  Yiying Li,et al.  A Fixed Version of Quadratic Program in Gradient Episodic Memory , 2021, ArXiv.

[2]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Murali Annavaram,et al.  Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge , 2020, NeurIPS.

[4]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[5]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[6]  Ying-Chang Liang,et al.  Incentive Design for Efficient Federated Learning in Mobile Networks: A Contract Theory Approach , 2019, 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS).

[7]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[8]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[9]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[11]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[12]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[13]  Vitaly Shmatikov,et al.  Inference Attacks Against Collaborative Learning , 2018, ArXiv.

[14]  Johannes Jahn,et al.  Duality in vector optimization , 1983, Math. Program..

[15]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[16]  Wei Zhou,et al.  Feature-Critic Networks for Heterogeneous Domain Generalization , 2019, ICML.

[17]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[18]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[19]  Kate Saenko,et al.  Federated Adversarial Domain Adaptation , 2020, ICLR.

[20]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[21]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[22]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[23]  Fei Wu,et al.  Federated Mutual Learning , 2020, ArXiv.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[26]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[29]  Song Han,et al.  Deep Leakage from Gradients , 2019, NeurIPS.

[30]  Junpu Wang,et al.  FedMD: Heterogenous Federated Learning via Model Distillation , 2019, ArXiv.

[31]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[32]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).