Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning

In this paper, we propose a communication-efficient decen-tralized machine learning (ML) algorithm, coined quantized group ADMM (Q-GADMM). Every worker in Q-GADMM communicates only with two neighbors, and updates its model via the group alternating direct method of multiplier (GADMM), thereby ensuring fast convergence while reducing the number of communication rounds. Furthermore, each worker quantizes its model updates before transmissions, thereby decreasing the communication payload sizes. We prove that Q-GADMM converges to the optimal solution for convex loss functions, and numerically show that Q-GADMM yields 7x less communication cost while achieving almost the same accuracy and convergence speed compared to GADMM without quantization.

[1]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[2]  Mehdi Bennis,et al.  Wireless Network Intelligence at the Edge , 2018, Proceedings of the IEEE.

[3]  Mehdi Bennis,et al.  Massive Autonomous UAV Path Planning: A Neural Network Based Mean-Field Game Theoretic Approach , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[4]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[5]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[6]  Zhi-Quan Luo,et al.  Parallel Direction Method of Multipliers , 2014, NIPS.

[7]  Michael I. Jordan,et al.  Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..

[8]  Takayuki Nishio,et al.  Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[9]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[10]  Joonhyuk Kang,et al.  Wireless Federated Distillation for Distributed Edge Learning with Heterogeneous Data , 2019, 2019 IEEE 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC).

[11]  Han Cha,et al.  Distilling On-Device Intelligence at the Network Edge , 2019, ArXiv.

[12]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[13]  Vaneet Aggarwal,et al.  A Method to Improve Consensus Averaging using Quantized ADMM , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[14]  Mehdi Bennis,et al.  Communication-Efficient Massive UAV Online Path Control: Federated Learning Meets Mean-Field Game Theory , 2020, IEEE Transactions on Communications.

[15]  Takayuki Nishio,et al.  Extreme URLLC: Vision, Challenges, and Key Enablers , 2020, ArXiv.

[16]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[17]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[18]  Ketan Rajawat,et al.  Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks , 2019, IEEE Transactions on Signal Processing.

[19]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[20]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[21]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[22]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[23]  Georgios B. Giannakis,et al.  Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients , 2019, NeurIPS.

[24]  Brian M. Sadler,et al.  Proximity Without Consensus in Online Multiagent Optimization , 2016, IEEE Transactions on Signal Processing.

[25]  Qing Ling,et al.  Communication-Censored ADMM for Decentralized Consensus Optimization , 2019, IEEE Transactions on Signal Processing.

[26]  Li Chen,et al.  Accelerating Federated Learning via Momentum Gradient Descent , 2019, IEEE Transactions on Parallel and Distributed Systems.

[27]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[28]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[29]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[30]  H. Vincent Poor,et al.  Scheduling Policies for Federated Learning in Wireless Networks , 2019, IEEE Transactions on Communications.

[31]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[32]  Mehdi Bennis,et al.  Remote UAV Online Path Planning via Neural Network-Based Opportunistic Control , 2019, IEEE Wireless Communications Letters.

[33]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[34]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[35]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[36]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.

[37]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[38]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[39]  Mingyi Hong,et al.  Quantized consensus ADMM for multi-agent distributed optimization , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Yinghuan Shi,et al.  Group-Based Alternating Direction Method of Multipliers for Distributed Linear Classification , 2017, IEEE Transactions on Cybernetics.

[41]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[42]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[43]  Mehdi Bennis,et al.  GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning , 2019, J. Mach. Learn. Res..

[44]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.

[45]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[46]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[47]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[48]  Mehdi Bennis,et al.  Predictive Control and Communication Co-Design: A Gaussian Process Regression Approach , 2020, 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[49]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.