Communication-efficient algorithms for decentralized and stochastic optimization

We present a new class of decentralized first-order methods for nonsmooth and stochastic optimization problems defined over multiagent networks. Considering that communication is a major bottleneck in decentralized optimization, our main goal in this paper is to develop algorithmic frameworks which can significantly reduce the number of inter-node communications. Our major contribution is to present a new class of decentralized primal–dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions. By employing DCS, agents can find an $$\epsilon $$ ϵ -solution both in terms of functional optimality gap and feasibility residual in $${{\mathcal {O}}}(1/\epsilon )$$ O ( 1 / ϵ ) (resp., $${{\mathcal {O}}}(1/\sqrt{\epsilon })$$ O ( 1 / ϵ ) ) communication rounds for general convex functions (resp., strongly convex functions), while maintaining the $${{\mathcal {O}}}(1/\epsilon ^2)$$ O ( 1 / ϵ 2 ) (resp., $$\mathcal{O}(1/\epsilon )$$ O ( 1 / ϵ ) ) bound on the total number of intra-node subgradient evaluations. We also present a stochastic counterpart for these algorithms, denoted by SDCS, for solving stochastic optimization problems whose objective function cannot be evaluated exactly. In comparison with existing results for decentralized nonsmooth and stochastic optimization, we can reduce the total number of inter-node communication rounds by orders of magnitude while still maintaining the optimal complexity bounds on intra-node stochastic subgradient evaluations. The bounds on the (stochastic) subgradient evaluations are actually comparable to those required for centralized nonsmooth and stochastic optimization under certain conditions on the target accuracy.

[1]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[2]  S. Vajda Studies in Linear and Non-Linear Programming. (Stanford Mathematical Studies in the Social Sciences.) By K. J. Arrow, L. Hurwicz, and H. Uzawa. Pp. 229. 60s. 1958. (Stanford Univ. Press) , 1960, The Mathematical Gazette.

[3]  B. V. Dean,et al.  Studies in Linear and Non-Linear Programming. , 1959 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[6]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[7]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[8]  John N. Tsitsiklis,et al.  Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[9]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[10]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[11]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[12]  Jie Lin,et al.  Coordination of groups of mobile autonomous agents using nearest neighbor rules , 2003, IEEE Trans. Autom. Control..

[13]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[14]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[15]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[16]  Angelia Nedic,et al.  Distributed Non-Autonomous Power Control through Distributed Convex Optimization , 2009, IEEE INFOCOM 2009.

[17]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[18]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[19]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[20]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[21]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[22]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[23]  Renato D. C. Monteiro,et al.  Complexity of Variants of Tseng's Modified F-B Splitting and Korpelevich's Methods for Hemivariational Inequalities with Applications to Saddle-point and Convex Optimization Problems , 2011, SIAM J. Optim..

[24]  Angelia Nedic,et al.  Asynchronous Broadcast-Based Convex Optimization Over a Network , 2011, IEEE Transactions on Automatic Control.

[25]  R. Murray,et al.  Decentralized Multi-Agent Optimization via Dual Decomposition , 2011 .

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[28]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Convex Optimization Over Random Networks , 2011, IEEE Transactions on Automatic Control.

[29]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[30]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[31]  Asuman E. Ozdaglar,et al.  A fast distributed proximal-gradient method , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[33]  Sonia Martínez,et al.  On Distributed Convex Optimization Under Inequality and Equality Constraints , 2010, IEEE Transactions on Automatic Control.

[34]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[35]  Michael G. Rabbat,et al.  Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[36]  Michael G. Rabbat,et al.  Push-Sum Distributed Dual Averaging for convex optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[37]  Alexander Shapiro,et al.  Validation analysis of mirror descent stochastic approximation method , 2012, Math. Program..

[38]  Antonio Franchi,et al.  Distributed pursuit-evasion without mapping or global localization via local frontiers , 2012, Auton. Robots.

[39]  Michael G. Rabbat,et al.  Consensus-based distributed online prediction and optimization , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[40]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[41]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[42]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[43]  Renato D. C. Monteiro,et al.  Iteration-Complexity of Block-Decomposition Algorithms and the Alternating Direction Method of Multipliers , 2013, SIAM J. Optim..

[44]  D. Bertsekas,et al.  Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization , 2013 .

[45]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[46]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[47]  Anna Scaglione,et al.  Distributed Constrained Optimization by Consensus-Based Primal-Dual Perturbation Method , 2013, IEEE Transactions on Automatic Control.

[48]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[49]  Geert Leus,et al.  Distributed Time-Varying Stochastic Optimization and Utility-Based Communication , 2014, ArXiv.

[50]  Qiong Wu,et al.  Distributed Mirror Descent over Directed Graphs , 2014, ArXiv.

[51]  Guanghui Lan,et al.  Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[52]  Qing Ling,et al.  A Proximal Gradient Algorithm for Decentralized Composite Optimization , 2015, IEEE Transactions on Signal Processing.

[53]  Guanghui Lan,et al.  Randomized Block Subgradient Methods for Convex Nonsmooth and Stochastic Optimization , 2015, 1509.04609.

[54]  Yunmei Chen,et al.  An Accelerated Linearized Alternating Direction Method of Multipliers , 2014, SIAM J. Imaging Sci..

[55]  Michael G. Rabbat,et al.  Multi-agent mirror descent for decentralized stochastic optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[56]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[57]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[58]  Angelia Nedic,et al.  Distributed Optimization Over Time-Varying Directed Graphs , 2015, IEEE Trans. Autom. Control..

[59]  Niao He,et al.  Mirror Prox algorithm for multi-term composite minimization and semi-separable problems , 2013, Computational Optimization and Applications.

[60]  Dimitri P. Bertsekas,et al.  Incremental Aggregated Proximal and Augmented Lagrangian Algorithms , 2015, ArXiv.

[61]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[62]  Alexander Olshevsky,et al.  A geometrically convergent method for distributed optimization over time-varying graphs , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[63]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[64]  Erfan Yazdandoost Hamedani,et al.  A primal-dual method for conic constrained distributed optimization problems , 2016, NIPS.

[65]  Aryan Mokhtari,et al.  DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers , 2016, IEEE Transactions on Signal Processing.

[66]  Aryan Mokhtari,et al.  DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers , 2015, IEEE Transactions on Signal Processing.

[67]  Aryan Mokhtari,et al.  A Decentralized Second-Order Method with Exact Linear Convergence Rate for Consensus Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[68]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[69]  Asuman E. Ozdaglar,et al.  Convergence Rate of Distributed ADMM Over Networks , 2016, IEEE Transactions on Automatic Control.

[70]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[71]  Mingyi Hong,et al.  Stochastic Proximal Gradient Consensus Over Random Networks , 2015, IEEE Transactions on Signal Processing.

[72]  Asuman E. Ozdaglar,et al.  On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..

[73]  N. S. Aybat,et al.  Distributed Linearized Alternating Direction Method of Multipliers for Composite Convex Consensus Optimization , 2015, IEEE Transactions on Automatic Control.

[74]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[75]  Na Li,et al.  Harnessing Smoothness to Accelerate Distributed Optimization , 2016, IEEE Transactions on Control of Network Systems.