FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN.

[1]  Guido Mont'ufar,et al.  Optimization Theory for ReLU Neural Networks Trained with Normalization Layers , 2020, ICML.

[2]  Ali Jadbabaie,et al.  Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[3]  Eric W. Tramel,et al.  Siloed Federated Learning for Multi-Centric Histopathology Datasets , 2020, DART/DCL@MICCAI.

[4]  Lequan Yu,et al.  MS-Net: Multi-Site Network for Improving Prostate Segmentation With Heterogeneous MRI Data , 2020, IEEE Transactions on Medical Imaging.

[5]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[6]  Matthew Botvinick,et al.  On the importance of single directions for generalization , 2018, ICLR.

[7]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[8]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[9]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[10]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[11]  Wotao Yin,et al.  FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data , 2020, ArXiv.

[12]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[13]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[14]  Omri Weinstein,et al.  Training (Overparametrized) Neural Networks in Near-Linear Time , 2021, ITCS.

[15]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[17]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[20]  Ruosong Wang,et al.  Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.

[21]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[22]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.

[23]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[25]  Stefan Wrobel,et al.  Efficient Decentralized Deep Learning by Dynamic Model Averaging , 2018, ECML/PKDD.

[26]  Ramesh Raskar,et al.  FedML: A Research Library and Benchmark for Federated Machine Learning , 2020, ArXiv.

[27]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[28]  Daniel Rueckert,et al.  A generic framework for privacy preserving deep learning , 2018, ArXiv.

[29]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[30]  Titouan Parcollet,et al.  Flower: A Friendly Federated Learning Research Framework , 2020, ArXiv.

[31]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[32]  Jiaying Liu,et al.  Adaptive Batch Normalization for practical domain adaptation , 2018, Pattern Recognit..

[33]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[34]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[35]  Ping Luo,et al.  Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[38]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[39]  Daniel P. Kennedy,et al.  The Autism Brain Imaging Data Exchange: Towards Large-Scale Evaluation of the Intrinsic Brain Architecture in Autism , 2013, Molecular Psychiatry.

[40]  Thomas Hofmann,et al.  Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization , 2018, AISTATS.

[41]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[42]  Daniel C. Castro,et al.  Morpho-MNIST: Quantitative Assessment and Diagnostics for Representation Learning , 2018, J. Mach. Learn. Res..

[43]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[44]  Bohyung Han,et al.  Domain-Specific Batch Normalization for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Jiaying Liu,et al.  Revisiting Batch Normalization For Practical Domain Adaptation , 2016, ICLR.