Federated Learning Over Wireless Fading Channels

We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD) over this shared noisy wireless channel. We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error from previous iterations, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. We also design a power allocation scheme to align the received gradient vectors at the PS in an efficient manner. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.

[1]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[2]  Dan Alistarh,et al.  The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.

[3]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[4]  Klaus-Robert Müller,et al.  Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[5]  H. Vincent Poor,et al.  Channel Coding Rate in the Finite Blocklength Regime , 2010, IEEE Transactions on Information Theory.

[6]  Peter Richtárik,et al.  Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..

[7]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[8]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[9]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[10]  Takuya Akiba,et al.  Variance-based Gradient Compression for Efficient Distributed Deep Learning , 2018, ICLR.

[11]  Wen Gao,et al.  Power Distortion Optimization for Uncoded Linear Transformed Transmission of Images and Videos , 2017, IEEE Transactions on Image Processing.

[12]  Deniz Gündüz,et al.  Over-the-Air Machine Learning at the Wireless Edge , 2019, 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[13]  Nikko Strom,et al.  Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.

[14]  Xu Sun,et al.  meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting , 2017, ICML.

[15]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[16]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[19]  Kaibin Huang,et al.  Broadband Analog Aggregation for Low-Latency Federated Edge Learning , 2018, IEEE Transactions on Wireless Communications.

[20]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[21]  Tao Lin,et al.  Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[22]  Dan Alistarh,et al.  SparCML: high-performance sparse communication for machine learning , 2018, SC.

[23]  Dina Katabi,et al.  SoftCast: one-size-fits-all wireless video , 2010, SIGCOMM '10.

[24]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[25]  Deniz Gündüz,et al.  SparseCast: Hybrid Digital-Analog Wireless Image Transmission Exploiting Frequency-Domain Sparsity , 2018, IEEE Communications Letters.

[26]  Yong Wang,et al.  Low-Latency Broadband Analog Aggregation for Federated Edge Learning , 2018, ArXiv.

[27]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[28]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[29]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[30]  Dimitris S. Papailiopoulos,et al.  ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.

[31]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[32]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[33]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[34]  Yiran Chen,et al.  Running sparse and low-precision neural network: When algorithm meets hardware , 2018, 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC).

[35]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[36]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.