On the Design of Communication Efficient Federated Learning over Wireless Networks

Recently, federated learning (FL), as a promising distributed machine learning approach, has attracted lots of research efforts. In FL, the parameter server and the mobile devices share the training parameters over wireless links. As a result, reducing the communication overhead becomes one of the most critical challenges. Despite that there have been various communication-efficient machine learning algorithms in literature, few of the existing works consider their implementation over wireless networks. In this work, the idea of SignSGD is adopted and only the signs of the gradients are shared between the mobile devices and the parameter server. In addition, different from most of the existing works that consider Channel State Information (CSI) at both the transmitter side and the receiver side, only receiver side CSI is assumed. In such a case, an essential problem for the mobile devices is to select appropriate local processing and communication parameters. In particular, two tradeoffs are observed under a fixed total training time: (i) given the time for each communication round, the energy consumption versus the outage probability per communication round and (ii) given the energy consumption, the number of communication rounds versus the outage probability per communication round. Two optimization problems regarding the aforementioned two tradeoffs are formulated and solved. The first problem minimizes the energy consumption given the outage probability (and therefore the learning performance) requirement while the second problem optimizes the learning performance given the energy consumption requirement. Furthermore, the heterogeneous data distribution scenario is considered and a new algorithm that can deal with heterogeneous data distribution is proposed. Extensive simulations are performed to demonstrate the effectiveness of the proposed method.

[1]  Kin K. Leung,et al.  Energy-Efficient Radio Resource Allocation for Federated Edge Learning , 2019, 2020 IEEE International Conference on Communications Workshops (ICC Workshops).

[2]  Dimitris S. Papailiopoulos,et al.  ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.

[3]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[4]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[5]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[7]  Abbas Jamalipour,et al.  Wireless communications , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[8]  Thomas D. Burd,et al.  Processor design for portable systems , 1996, J. VLSI Signal Process..

[9]  Junzhou Huang,et al.  Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization , 2018, ICML.

[10]  Richeng Jin,et al.  Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees , 2020, ArXiv.

[11]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[12]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[14]  Klaus-Robert Müller,et al.  Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[15]  Sebastian Caldas,et al.  Expanding the Reach of Federated Learning by Reducing Client Resource Requirements , 2018, ArXiv.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Albert Y. Zomaya,et al.  Federated Learning over Wireless Networks: Optimization Model Design and Analysis , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[18]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[19]  Deniz Gündüz,et al.  One-Bit Over-the-Air Aggregation for Communication-Efficient Federated Edge Learning: Design and Convergence Analysis , 2020, ArXiv.

[20]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[21]  Hien Quoc Ngo,et al.  Cell-Free Massive MIMO for Wireless Federated Learning , 2019, IEEE Transactions on Wireless Communications.

[22]  Guanding Yu,et al.  Accelerating DNN Training in Wireless Federated Edge Learning Systems , 2019, IEEE Journal on Selected Areas in Communications.

[23]  Deniz Gündüz,et al.  Federated Learning Over Wireless Fading Channels , 2019, IEEE Transactions on Wireless Communications.

[24]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[25]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.