Communication Efficient Framework for Decentralized Machine Learning

In this paper, we propose a fast, privacy-aware, and communication-efficient decentralized framework to solve the distributed machine learning (DML) problem. The proposed algorithm is based on the Alternating Direction Method of Multipliers (ADMM) algorithm. The key novelty in the proposed algorithm is that it solves the problem in a decentralized topology where at most half of the workers are competing the limited communication resources at any given time. Moreover, each worker exchanges the locally trained model only with two neighboring workers, thereby training a global model with a lower amount of communication overhead in each exchange. We prove that GADMM converges faster than the centralized batch gradient descent for convex loss functions, and numerically show that it converges faster and more communication-efficient than the state-of-the-art communication-efficient algorithms such as the Lazily Aggregated Gradient (LAG) and dual averaging, in linear and logistic regression tasks on synthetic and real datasets.

[1]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[2]  Vaneet Aggarwal,et al.  A Method to Improve Consensus Averaging using Quantized ADMM , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[3]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[4]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[5]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[6]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[7]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[8]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[10]  Mingyi Hong,et al.  Quantized consensus ADMM for multi-agent distributed optimization , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[12]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[13]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[14]  Mehdi Bennis,et al.  Wireless Network Intelligence at the Edge , 2018, Proceedings of the IEEE.

[15]  Ketan Rajawat,et al.  Asynchronous Saddle Point Algorithm for Stochastic Optimization in Heterogeneous Networks , 2019, IEEE Transactions on Signal Processing.

[16]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[17]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[18]  Mehdi Bennis,et al.  Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data , 2018, ArXiv.

[19]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[21]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[22]  Georgios B. Giannakis,et al.  LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.

[23]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[24]  Qing Ling,et al.  Communication-Censored ADMM for Decentralized Consensus Optimization , 2019, IEEE Transactions on Signal Processing.

[25]  Mehdi Bennis,et al.  GADMM: Fast and Communication Efficient Framework for Distributed Machine Learning , 2019, J. Mach. Learn. Res..

[26]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Michael I. Jordan,et al.  Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..

[28]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[29]  Brian M. Sadler,et al.  Proximity Without Consensus in Online Multiagent Optimization , 2016, IEEE Transactions on Signal Processing.

[30]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[31]  Asuman E. Ozdaglar,et al.  On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..

[32]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[33]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[34]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[35]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.