To Talk or to Work: Flexible Communication Compression for Energy Efficient Federated Learning over Heterogeneous Mobile Edge Devices

Recent advances in machine learning, wireless communication, and mobile hardware technologies promisingly enable federated learning (FL) over massive mobile edge devices, which opens new horizons for numerous intelligent mobile applications. Despite the potential benefits, FL imposes huge communication and computation burdens on participating devices due to periodical global synchronization and continuous local training, raising great challenges to battery constrained mobile devices. In this work, we target at improving the energy efficiency of FL over mobile edge networks to accommodate heterogeneous participating devices without sacrificing the learning performance. To this end, we develop a convergence-guaranteed FL algorithm enabling flexible communication compression. Guided by the derived convergence bound, we design a compression control scheme to balance the energy consumption of local computing (i.e., “working”) and wireless communication (i.e., “talking”) from the long-term learning perspective. In particular, the compression parameters are elaborately chosen for FL participants adapting to their computing and communication environments. Extensive simulations are conducted using various datasets to validate our theoretical analysis, and the results also demonstrate the efficacy of the proposed scheme in energy saving.

[1]  Francisco Facchinei,et al.  Parallel and Distributed Methods for Constrained Nonconvex Optimization—Part I: Theory , 2016, IEEE Transactions on Signal Processing.

[2]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[3]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[4]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[5]  H. Vincent Poor,et al.  Convergence Time Optimization for Federated Learning Over Wireless Networks , 2020, IEEE Transactions on Wireless Communications.

[6]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[7]  Martin Jaggi,et al.  Sparsified SGD with Memory , 2018, NeurIPS.

[8]  Klaus-Robert Müller,et al.  Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[9]  Rong Jin,et al.  On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization , 2019, ICML.

[10]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[11]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[12]  Guanding Yu,et al.  Accelerating DNN Training in Wireless Federated Edge Learning Systems , 2019, IEEE Journal on Selected Areas in Communications.

[13]  Miao Pan,et al.  Energy-Efficient Proactive Caching for Adaptive Video Streaming via Data-Driven Optimization , 2020, IEEE Internet of Things Journal.

[14]  Shenghuo Zhu,et al.  Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.

[15]  Christodoulos A. Floudas Generalized Benders Decomposition , 2009, Encyclopedia of Optimization.

[16]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Suhas Diggavi,et al.  Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.

[19]  Hanlin Tang,et al.  Communication Compression for Decentralized Training , 2018, NeurIPS.

[20]  Hai Liu,et al.  Energy efficient real-time task scheduling on CPU-GPU hybrid clusters , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[21]  Albert Y. Zomaya,et al.  Federated Learning over Wireless Networks: Optimization Model Design and Analysis , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[22]  Walid Saad,et al.  A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems , 2019, IEEE Network.

[23]  Tao Lin,et al.  Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[24]  Dan Alistarh,et al.  The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.

[25]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[26]  Mikael Johansson,et al.  Communication Efficient Sparsification for Large Scale Machine Learning , 2020, 2003.06377.

[27]  Miao Pan,et al.  Differentially Private and Communication Efficient Collaborative Learning , 2021, AAAI.

[28]  Choong Seon Hong,et al.  Federated Learning Based Mobile Edge Computing for Augmented Reality Applications , 2020, 2020 International Conference on Computing, Networking and Communications (ICNC).

[29]  Miao Pan,et al.  Joint routing and link scheduling for cognitive radio networks under uncertain spectrum supply , 2011, 2011 Proceedings IEEE INFOCOM.

[30]  Canh Dinh,et al.  Federated Learning Over Wireless Networks: Convergence Analysis and Resource Allocation , 2019, IEEE/ACM Transactions on Networking.