Bayesian Federated Learning over Wireless Networks

Federated learning is a privacy-preserving and distributed training method using heterogeneous data sets stored at local devices. Federated learning over wireless networks requires aggregating locally computed gradients at a server where the mobile devices send statistically distinct gradient information over heterogenous communication links. This paper proposes a Bayesian federated learning (BFL) algorithm to aggregate the heterogeneous quantized gradient information optimally in the sense of minimizing the mean-squared error (MSE). The idea of BFL is to aggregate the one-bit quantized local gradients at the server by jointly exploiting i) the prior distributions of the local gradients, ii) the gradient quantizer function, and iii) channel distributions. Implementing BFL requires high communication and computational costs as the number of mobile devices increases. To address this challenge, we also present an efficient modified BFL algorithm called scalable-BFL (SBFL). In SBFL, we assume a simplified distribution on the local gradient. Each mobile device sends its one-bit quantized local gradient together with two scalar parameters representing this distribution. The server then aggregates the noisy and faded quantized gradients to minimize the MSE. We provide a convergence analysis of SBFL for a class of non-convex loss functions. Our analysis elucidates how the parameters of communication channels and the gradient priors affect convergence. From simulations, we demonstrate that SBFL considerably outperforms the conventional sign stochastic gradient descent algorithm when training and testing neural networks using MNIST data sets over heterogeneous wireless networks.

[1]  Robert W. Heath,et al.  Foundations of MIMO Communication , 2018 .

[2]  Alessandro Neri,et al.  Estimation of the autocorrelation function of complex Gaussian stationary processes by amplitude clipped signals , 1994, IEEE Trans. Inf. Theory.

[3]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[4]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[5]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Yonina C. Eldar,et al.  Over-the-Air Federated Learning From Heterogeneous Data , 2020, IEEE Transactions on Signal Processing.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Michel Robert,et al.  Overview of narrowband IoT in LTE Rel-13 , 2016, 2016 IEEE Conference on Standards for Communications and Networking (CSCN).

[9]  Dan Alistarh,et al.  The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.

[10]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[11]  Michael S. Mollel,et al.  Comparison of Empirical Propagation Path Loss Models for Mobile Communication , 2014 .

[12]  Deniz Gündüz,et al.  One-Bit Over-the-Air Aggregation for Communication-Efficient Federated Edge Learning: Design and Convergence Analysis , 2020, ArXiv.

[13]  Yonina C. Eldar,et al.  UVeQFed: Universal Vector Quantization for Federated Learning , 2020, IEEE Transactions on Signal Processing.

[14]  Sebastian U. Stich,et al.  Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Kamyar Azizzadenesheli,et al.  signSGD with Majority Vote is Communication Efficient and Fault Tolerant , 2018, ICLR.

[19]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[20]  Julian J. Bussgang,et al.  Crosscorrelation functions of amplitude-distorted gaussian signals , 1952 .

[21]  Shaohuai Shi,et al.  A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[22]  Kaiwen Zhou,et al.  Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning , 2019, ArXiv.

[23]  Ji Liu,et al.  Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.

[24]  Purnima K Sharma,et al.  Cell Coverage Area and Link Budget Calculations in LTE System , 2016 .

[25]  Deniz Gündüz,et al.  Federated Learning Over Wireless Fading Channels , 2019, IEEE Transactions on Wireless Communications.

[26]  Tianjian Chen,et al.  Federated Machine Learning: Concept and Applications , 2019 .

[27]  Mihaela van der Schaar,et al.  Machine Learning in the Air , 2019, IEEE Journal on Selected Areas in Communications.

[28]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[29]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[30]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[31]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[32]  Min Chen,et al.  Narrow Band Internet of Things , 2017, IEEE Access.

[33]  Shaohuai Shi,et al.  Understanding Top-k Sparsification in Distributed Deep Learning , 2019, ArXiv.