Fast-Convergent Federated Learning

Federated learning has emerged recently as a promising solution for distributing machine learning tasks through modern networks of mobile devices. Recent studies have obtained lower bounds on the expected decrease in model loss that is achieved through each round of federated learning. However, convergence generally requires a large number of communication rounds, which induces delay in model training and is costly in terms of network resources. In this paper, we propose a fast-convergent federated learning algorithm, called <inline-formula> <tex-math notation="LaTeX">$\mathsf {FOLB}$ </tex-math></inline-formula>, which performs intelligent sampling of devices in each round of model training to optimize the expected convergence speed. We first theoretically characterize a lower bound on improvement that can be obtained in each round if devices are selected according to the expected improvement their local models will provide to the current global model. Then, we show that <inline-formula> <tex-math notation="LaTeX">$\mathsf {FOLB}$ </tex-math></inline-formula> obtains this bound through uniform sampling by weighting device updates according to their gradient information. <inline-formula> <tex-math notation="LaTeX">$\mathsf {FOLB}$ </tex-math></inline-formula> is able to handle both communication and computation heterogeneity of devices by adapting the aggregations according to estimates of device’s capabilities of contributing to the updates. We evaluate <inline-formula> <tex-math notation="LaTeX">$\mathsf {FOLB}$ </tex-math></inline-formula> in comparison with existing federated learning algorithms and experimentally show its improvement in trained model accuracy, convergence speed, and/or model stability across various machine learning tasks and datasets.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[4]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[5]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[6]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[7]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[8]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[9]  Ohad Shamir,et al.  Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.

[10]  Yann LeCun,et al.  Deep learning with Elastic Averaging SGD , 2014, NIPS.

[11]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[12]  Peter Richtárik,et al.  Distributed Coordinate Descent Method for Learning with Big Data , 2013, J. Mach. Learn. Res..

[13]  Tao Zhang,et al.  Fog and IoT: An Overview of Research Opportunities , 2016, IEEE Internet of Things Journal.

[14]  Alexander J. Smola,et al.  AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.

[15]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[16]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[17]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[18]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[19]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[20]  Anit Kumar Sahu,et al.  On the Convergence of Federated Optimization in Heterogeneous Networks , 2018, ArXiv.

[21]  Nathan Srebro,et al.  Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.

[22]  Gaurav Kapoor,et al.  Protection Against Reconstruction and Its Applications in Private Federated Learning , 2018, ArXiv.

[23]  Jianyu Wang,et al.  Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.

[24]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[25]  Peng Jiang,et al.  A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.

[26]  Shenghuo Zhu,et al.  Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication , 2018, ArXiv.

[27]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning , 2019, ArXiv.

[28]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[29]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[30]  Vyas Sekar,et al.  Enhancing the Privacy of Federated Learning with Sketching , 2019, ArXiv.

[31]  Shenghuo Zhu,et al.  Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.

[32]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[33]  Badih Ghazi,et al.  Scalable and Differentially Private Distributed Aggregation in the Shuffled Model , 2019, ArXiv.

[34]  Walid Saad,et al.  Federated Learning for Edge Networks: Resource Optimization and Incentive Mechanism , 2019, IEEE Communications Magazine.

[35]  Yonina C. Eldar,et al.  Federated Learning with Quantization Constraints , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Jun Li,et al.  Gradient Estimation for Federated Learning over Massive MIMO Communication Systems , 2020, ArXiv.

[37]  Huaiyu Dai,et al.  From Federated to Fog Learning: Distributed Machine Learning over Heterogeneous Wireless Networks , 2020, IEEE Communications Magazine.

[38]  Carlee Joe-Wong,et al.  Network-Aware Optimization of Distributed Learning for Fog Computing , 2020, IEEE/ACM Transactions on Networking.

[39]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[40]  Xiaofei Wang,et al.  Convergence of Edge Computing and Deep Learning: A Comprehensive Survey , 2019, IEEE Communications Surveys & Tutorials.

[41]  Tian Li,et al.  Fair Resource Allocation in Federated Learning , 2019, ICLR.

[42]  Tao Lin,et al.  Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[43]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[44]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[45]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[46]  Kannan Ramchandran,et al.  Communication Efficient Distributed Approximate Newton Method , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[47]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[48]  Canh Dinh,et al.  Federated Learning Over Wireless Networks: Convergence Analysis and Resource Allocation , 2019, IEEE/ACM Transactions on Networking.

[49]  Christopher G. Brinton,et al.  Network-Aware Optimization of Distributed Learning for Fog Computing , 2021, IEEE/ACM Transactions on Networking.

[50]  H. Vincent Poor,et al.  Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge , 2021, IEEE Transactions on Wireless Communications.

[51]  Jun Li,et al.  A Compressive Sensing Approach for Federated Learning Over Massive MIMO Communication Systems , 2020, IEEE Transactions on Wireless Communications.

[52]  Manzil Zaheer,et al.  Adaptive Federated Optimization , 2020, ICLR.

[53]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[54]  H. Vincent Poor,et al.  Convergence Time Optimization for Federated Learning Over Wireless Networks , 2020, IEEE Transactions on Wireless Communications.