Unbounded Gradients in Federated Learning with Buffered Asynchronous Aggregation

Synchronous updates may compromise the efficiency of cross-device federated learning once the number of active clients increases. The FedBuff algorithm (Nguyen et al. [1]) alleviates this problem by allowing asynchronous updates (staleness), which enhances the scalability of training while preserving privacy via secure aggregation. We revisit the FedBuff algorithm for asynchronous federated learning and extend the existing analysis by removing the boundedness assumptions from the gradient norm. This paper presents a theoretical analysis of the convergence rate of this algorithm when heterogeneity in data, batch size, and delay are considered.

[1]  Sebastian U. Stich,et al.  Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning , 2022, NeurIPS.

[2]  Blake E. Woodworth,et al.  Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays , 2022, NeurIPS.

[3]  Suhas Diggavi,et al.  A Field Guide to Federated Optimization , 2021, ArXiv.

[4]  Michael G. Rabbat,et al.  Federated Learning with Buffered Asynchronous Aggregation , 2021, AISTATS.

[5]  P. Kairouz,et al.  The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation , 2021, ICML.

[6]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[7]  Hamid Reza Feyzmahdavian,et al.  Advances in Asynchronous Parallel and Distributed Optimization , 2020, Proceedings of the IEEE.

[8]  Nguyen H. Tran,et al.  Personalized Federated Learning with Moreau Envelopes , 2020, NeurIPS.

[9]  Shusen Yang,et al.  Asynchronous Federated Learning with Differential Privacy for Edge Intelligence , 2019, ArXiv.

[10]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[11]  Peter Richtárik,et al.  Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2019, AISTATS.

[12]  Martin Jaggi,et al.  Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.

[13]  Indranil Gupta,et al.  Asynchronous Federated Optimization , 2019, ArXiv.

[14]  Shenghuo Zhu,et al.  Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.

[15]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[16]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[17]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[18]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[19]  Hamid Reza Feyzmahdavian,et al.  An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[20]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[21]  César A. Uribe,et al.  PARS-Push: Personalized, Asynchronous and Robust Decentralized Optimization , 2022, IEEE Control Systems Letters.

[22]  Christopher A. Choquette-Choo,et al.  Communication Efficient Federated Learning with Secure Aggregation and Differential Privacy , 2021 .

[23]  W. Bastiaan Kleijn,et al.  Asynchronous Decentralized Optimization With Implicit Stochastic Variance Reduction , 2021, ICML.

[24]  Aryan Mokhtari,et al.  Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach , 2020, NeurIPS.