Federated Adam-Type Algorithm for Distributed Optimization With Lazy Strategy

For large-scale machine learning tasks, distributing data in multiple clients, and using distributed optimization algorithms with a parameter server can accelerate the training process. The federated average algorithm has been widely used for distributed optimization via training local models in parallel and aggregating local models in a server to obtain the global model. To further improve the performance of the federated average algorithm, a novel federated learning algorithm have been proposed in this article by embedding a lazy strategy in the distributed Adam-type algorithm. In the proposed algorithm, the learning rate is adjusted adaptively in local update and lazy update strategy is applied on the second-order momentum of clients to make the learning rate identical. The convergence of the proposed algorithm is provided for both convex and nonconvex loss functions. Experiments have been conducted on MNIST digit recognition data set and CIFAR-10 data set. Experimental results show that the proposed algorithm can significantly reduce the communication overhead, thereby reduce the training time by 60% for CIFAR-10 data set, and the proposed algorithm achieve better performance than the federated average algorithm and its momentum version.

[1]  Huaqing Wu,et al.  AUCTION: Automated and Quality-Aware Client Selection Framework for Efficient Federated Learning , 2022, IEEE Transactions on Parallel and Distributed Systems.

[2]  Mi Wen,et al.  FedDetect: A Novel Privacy-Preserving Federated Learning Framework for Energy Theft Detection in Smart Grid , 2022, IEEE Internet of Things Journal.

[3]  Weihua Zhuang,et al.  Efficient Federated Meta-Learning Over Multi-Access Wireless Networks , 2021, IEEE Journal on Selected Areas in Communications.

[4]  Urmish Thakker,et al.  A Survey on Federated Learning for Resource-Constrained IoT Devices , 2021, IEEE Internet of Things Journal.

[5]  Pramod K. Varshney,et al.  STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning , 2021, NeurIPS.

[6]  Xiang Cheng,et al.  PoisonGAN: Generative Poisoning Attacks Against Federated Learning in Edge Computing Systems , 2021, IEEE Internet of Things Journal.

[7]  Vincent K. N. Lau,et al.  Analog Gradient Aggregation for Federated Learning Over Wireless Networks: Customized Design and Convergence Analysis , 2021, IEEE Internet of Things Journal.

[8]  Qinghua Liu,et al.  Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization , 2020, NeurIPS.

[9]  Yonina C. Eldar,et al.  UVeQFed: Universal Vector Quantization for Federated Learning , 2020, IEEE Transactions on Signal Processing.

[10]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[11]  O. Koyejo,et al.  Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates , 2019, ArXiv.

[12]  Xiang Li,et al.  Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.

[13]  Mohsen Guizani,et al.  Reliable Federated Learning for Mobile Networks , 2019, IEEE Wireless Communications.

[14]  Li Chen,et al.  Accelerating Federated Learning via Momentum Gradient Descent , 2019, IEEE Transactions on Parallel and Distributed Systems.

[15]  Anit Kumar Sahu,et al.  MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling , 2019, 2019 Sixth Indian Control Conference (ICC).

[16]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[17]  Indranil Gupta,et al.  Asynchronous Federated Optimization , 2019, ArXiv.

[18]  Peng Jiang,et al.  A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.

[19]  Pascal Bianchi,et al.  Convergence and Dynamical Behavior of the ADAM Algorithm for Nonconvex Stochastic Optimization , 2018, SIAM J. Optim..

[20]  Ruoyu Sun,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[21]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[22]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[23]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[24]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[28]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[29]  John C. Duchi,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011 .

[30]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[31]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[32]  Xuemin Shen,et al.  Optimizing Federated Learning in Distributed Industrial IoT: A Multi-Agent Approach , 2021, IEEE Journal on Selected Areas in Communications.

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  G. Min,et al.  Communication-Efficient Federated Learning for Wireless Edge Intelligence in IoT , 2020, IEEE Internet of Things Journal.