FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data

In the federated learning paradigm, multiple mobile clients train their local models independently based on the datasets generated by edge devices, and the server aggregates the model parameters received from multiple clients to form a global model. Conventional methods aggregate gradient parameters and statistical parameters without distinction, which leads to large aggregation bias due to cross-model distribution covariate shift (CDCS), and results in severe performance drop for federated learning under non-IID data. In this paper, we propose a novel decoupled parameter aggregation method called FedDNA to deal with the performance issues caused by CDCS. With the proposed method, the gradient parameters are aggregated using the conventional federated averaging method, and the statistical parameters are aggregated with an importance weighting method to reduce the divergence between the local models and the central model to optimize collaboratively by an adversarial learning algorithm based on variational autoencoder (VAE). Extensive experiments based on various federated learning scenarios with four open datasets show that FedDNA achieves significant performance improvement compared to the state-ofthe-art methods.

[1]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[2]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[3]  Tengyu Ma,et al.  Federated Accelerated Stochastic Gradient Descent , 2020, NeurIPS.

[4]  Sanjiv Kumar,et al.  Federated Learning with Only Positive Labels , 2020, ICML.

[5]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[6]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[7]  Qi Dou,et al.  FedBN: Federated Learning on Non-IID Features via Local Batch Normalization , 2021, ICLR.

[8]  Laurent Condat,et al.  From Local SGD to Local Fixed Point Methods for Federated Learning , 2020, ICML.

[9]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[10]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[11]  P. Killeen,et al.  An Alternative to Null-Hypothesis Significance Tests , 2005, Psychological science.

[12]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[13]  Yasaman Khazaeni,et al.  Federated Learning with Matched Averaging , 2020, ICLR.

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[16]  Yasaman Khazaeni,et al.  Bayesian Nonparametric Federated Learning of Neural Networks , 2019, ICML.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Vladimir Braverman,et al.  FetchSGD: Communication-Efficient Federated Learning with Sketching , 2020, ICML.

[19]  Yaochu Jin,et al.  Multi-Objective Evolutionary Federated Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Ali Jadbabaie,et al.  Robust Federated Learning: The Case of Affine Distribution Shifts , 2020, NeurIPS.

[21]  James M. Joyce Kullback-Leibler Divergence , 2011, International Encyclopedia of Statistical Science.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Martin J. Wainwright,et al.  FedSplit: An algorithmic framework for fast federated optimization , 2020, NeurIPS.

[24]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[26]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[27]  Venkatesh Saligrama,et al.  Federated Learning Based on Dynamic Regularization , 2021, ICLR.

[28]  Pan Li,et al.  Learning to Learn Gradient Aggregation by Gradient Descent , 2019, IJCAI.

[29]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Guosheng Lin,et al.  On lightweight privacy-preserving collaborative learning for internet-of-things objects , 2019, IoTDI.

[32]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[33]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.