Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

[1]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiangfeng Zhu,et al.  FedScale: Benchmarking Model and System Performance of Federated Learning , 2021, ResilientFL.

[3]  D. Doermann,et al.  Ensemble Attention Distillation for Privacy-Preserving Federated Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Ling-yu Duan,et al.  Federated Learning for Non-IID Data via Unified Feature Learning and Optimization Objective Alignment , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Shuai Yi,et al.  Collaborative Unsupervised Visual Representation Learning from Decentralized Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Praveer Singh,et al.  SplitAVG: A Heterogeneity-Aware Federated Deep Learning Method for Medical Imaging , 2021, IEEE Journal of Biomedical and Health Informatics.

[7]  D. Rubin,et al.  Handling Data Heterogeneity with Generative Replay in Collaborative Learning for Medical Imaging , 2021, Medical Image Anal..

[8]  Fahad Shahbaz Khan,et al.  Intriguing Properties of Vision Transformers , 2021, NeurIPS.

[9]  Pin-Yu Chen,et al.  Vision Transformers are Robust Learners , 2021, AAAI.

[10]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Marten van Dijk,et al.  On the Robustness of Vision Transformers to Adversarial Examples , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Bingsheng He,et al.  Model-Contrastive Federated Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andreas Veit,et al.  Understanding Robustness of Transformers for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Daniel L. Rubin,et al.  Addressing catastrophic forgetting for medical domain expansion , 2021, ArXiv.

[15]  Pheng-Ann Heng,et al.  FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  P. Abbeel,et al.  Pretrained Transformers as Universal Computation Engines , 2021, ArXiv.

[17]  Vishal M. Patel,et al.  Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  S. Fidler,et al.  Personalized Federated Learning with First Order Model Optimization , 2020, ICLR.

[19]  Tom B. Brown,et al.  Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[20]  Ziyi Kou,et al.  FairFL: A Fair Federated Learning Approach to Reducing Demographic Bias in Privacy-Sensitive Classification Models , 2020, 2020 IEEE International Conference on Big Data (Big Data).

[21]  Burak Kantarci,et al.  Federated Learning in Smart City Sensing: Challenges and Opportunities , 2020, Sensors.

[22]  Colin B. Compas,et al.  Federated Learning for Breast Density Classification: A Real-World Implementation , 2020, DART/DCL@MICCAI.

[23]  Spyridon Bakas,et al.  Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data , 2020, Scientific Reports.

[24]  Sebastian U. Stich,et al.  Ensemble Distillation for Robust Model Fusion in Federated Learning , 2020, NeurIPS.

[25]  V. Sudha,et al.  Diabetic Retinopathy Detection , 2020, International Journal of Engineering and Advanced Technology.

[26]  Rajib Rana,et al.  Federated Learning for Speech Emotion Recognition Applications , 2020 .

[27]  Michael Moeller,et al.  Inverting Gradients - How easy is it to break privacy in federated learning? , 2020, NeurIPS.

[28]  Ken Chang,et al.  Accounting for data variability in multi-institutional distributed deep learning for medical imaging , 2020, J. Am. Medical Informatics Assoc..

[29]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[30]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[31]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[32]  Ruslan Salakhutdinov,et al.  Think Locally, Act Globally: Federated Learning with Local and Global Representations , 2020, ArXiv.

[33]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[34]  Sunav Choudhary,et al.  Federated Learning with Personalization Layers , 2019, ArXiv.

[35]  Theodoros Salonidis,et al.  Differential Privacy-enabled Federated Learning for Sensitive Health Data , 2019, arXiv.org.

[36]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[37]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[38]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[39]  Solmaz Niknam,et al.  Federated Learning for Wireless Communications: Motivation, Opportunities, and Challenges , 2019, IEEE Communications Magazine.

[40]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[41]  K. S. Ng,et al.  Towards Fair and Privacy-Preserving Federated Deep Models , 2019, IEEE Transactions on Parallel and Distributed Systems.

[42]  Ruslan Salakhutdinov,et al.  Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.

[43]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[44]  Eric P. Xing,et al.  Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.

[45]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[46]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Alexandros Potamianos,et al.  An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models , 2019, NAACL.

[48]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[49]  Ramesh Raskar,et al.  Split learning for health: Distributed deep learning without sharing raw patient data , 2018, ArXiv.

[50]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[51]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[52]  Yen-Cheng Liu,et al.  Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines , 2018, ArXiv.

[53]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[54]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[55]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[56]  Bruce R. Rosen,et al.  Distributed deep learning networks among institutions for medical imaging , 2018, J. Am. Medical Informatics Assoc..

[57]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[58]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[59]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[60]  Yoshua Bengio,et al.  Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[61]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[62]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[63]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[64]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[70]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[71]  Ronghang Hu,et al.  Transformer is All You Need: Multimodal Multitask Learning with a Unified Transformer , 2021, ArXiv.

[72]  Lingjuan Lyu,et al.  Threats to Federated Learning , 2020, Federated Learning.

[73]  Xinyang Chen,et al.  Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning , 2019, NeurIPS.

[74]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[75]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[76]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.