Optimal Algorithms for Distributed Optimization

In this paper, we study the optimal convergence rate for distributed convex optimization problems in networks. We model the communication restrictions imposed by the network as a set of affine constraints and provide optimal complexity bounds for four different setups, namely: the function $F(\xb) \triangleq \sum_{i=1}^{m}f_i(\xb)$ is strongly convex and smooth, either strongly convex or smooth or just convex. Our results show that Nesterov's accelerated gradient descent on the dual problem can be executed in a distributed manner and obtains the same optimal rates as in the centralized version of the problem (up to constant or logarithmic factors) with an additional cost related to the spectral gap of the interaction matrix. Finally, we discuss some extensions to the proposed setup such as proximal friendly functions, time-varying graphs, improvement of the condition numbers.

[1]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[2]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[3]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[4]  Angelia Nedic,et al.  Distributed optimization over time-varying directed graphs , 2013, 52nd IEEE Conference on Decision and Control.

[5]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[7]  Michael G. Rabbat,et al.  Optimization and Analysis of Distributed Averaging With Short Node Memory , 2009, IEEE Transactions on Signal Processing.

[8]  Polly S Nichols,et al.  Agreeing to disagree. , 2005, General dentistry.

[9]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[10]  V. Borkar,et al.  Asymptotic agreement in distributed estimation , 1982 .

[11]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[12]  Robert D. Nowak,et al.  Decentralized source localization and tracking [wireless sensor networks] , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  M. Zibulevsky,et al.  Sequential Subspace Optimization Method for Large-Scale Unconstrained Problems , 2005 .

[14]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[15]  Guanghui Lan,et al.  Gradient sliding for composite optimization , 2014, Mathematical Programming.

[16]  Angelia Nedic,et al.  Distributed Learning for Cooperative Inference , 2017, ArXiv.

[17]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[18]  Ion Necoara,et al.  Random Coordinate Descent Algorithms for Multi-Agent Convex Optimization Over Networks , 2013, IEEE Transactions on Automatic Control.

[19]  Brian D. O. Anderson,et al.  Analysis of accelerated gossip algorithms , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[20]  Thinh T. Doan,et al.  Distributed resource allocation on dynamic networks in quadratic time , 2015, Syst. Control. Lett..

[21]  Guanghui Lan,et al.  Primal-dual first-order methods with O (1/e) iteration-complexity for cone programming. , 2011 .

[22]  P. Dvurechensky,et al.  Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations , 2015, 1508.00858.

[23]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[24]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[25]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[26]  Wei Shi,et al.  Linearly convergent decentralized consensus optimization over directed networks , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[27]  Roger M. Cooke,et al.  Statistics in Expert Resolution: A Theory of Weights for Combining Expert Opinion , 1990 .

[28]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[29]  Zheng Qu,et al.  Restarting accelerated gradient methods with a rough strong convexity estimate , 2016, 1609.07358.

[30]  Michael Athans,et al.  Convergence and asymptotic agreement in distributed decision problems , 1982, 1982 21st IEEE Conference on Decision and Control.

[31]  Daniela Pucci de Farias,et al.  Decentralized Resource Allocation in Dynamic Networks of Agents , 2008, SIAM J. Optim..

[32]  Stephen P. Boyd,et al.  Optimal Scaling of a Gradient Method for Distributed Resource Allocation , 2006 .

[33]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[34]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[35]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[36]  Angelia Nedic,et al.  Distributed Gaussian learning over time-varying directed graphs , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[37]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[38]  Yurii Nesterov,et al.  Double Smoothing Technique for Large-Scale Linearly Constrained Convex Optimization , 2012, SIAM J. Optim..

[39]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[40]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[41]  A. Gasnikov,et al.  Convex optimization in Hilbert space with applications to inverse problems , 2017, 1703.00267.

[42]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[43]  Angelia Nedic,et al.  Network independent rates in distributed learning , 2015, 2016 American Control Conference (ACC).

[44]  Angelia Nedić,et al.  Fast Convergence Rates for Distributed Non-Bayesian Learning , 2015, IEEE Transactions on Automatic Control.

[45]  M. Degroot Reaching a Consensus , 1974 .

[46]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[47]  P. Dvurechensky,et al.  Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints , 2017 .

[48]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[49]  John N. Tsitsiklis,et al.  On distributed averaging algorithms and quantization effects , 2007, 2008 47th IEEE Conference on Decision and Control.

[50]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[51]  A. Banerjee Convex Analysis and Optimization , 2006 .

[52]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[53]  Angelia Nedic,et al.  A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[54]  P. Dvurechensky Gradient Method With Inexact Oracle for Composite Non-Convex Optimization , 2017, 1703.09180.

[55]  Bin Hu,et al.  Robust convergence analysis of distributed optimization algorithms , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[56]  Y. Nesterov,et al.  Efficient numerical methods for entropy-linear programming problems , 2016, Computational Mathematics and Mathematical Physics.

[57]  Dusan Jakovetic,et al.  A Unification, Generalization, and Acceleration of Exact Distributed First Order Methods , 2017, ArXiv.

[58]  Wei Shi,et al.  Geometrically convergent distributed optimization with uncoordinated step-sizes , 2016, 2017 American Control Conference (ACC).

[59]  Alexey Chernov,et al.  Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints , 2016, DOOR.

[60]  G. Aleksandr,et al.  Efficient numerical algorithms for regularized regression problem with applications to traffic matrix estimations , 2015 .

[61]  Wei Shi,et al.  Improved Convergence Rates for Distributed Resource Allocation , 2017, 2018 IEEE Conference on Decision and Control (CDC).

[62]  Alexander Gasnikov,et al.  Gradient and gradient-free methods for stochastic convex optimization with inexact oracle , 2015 .

[63]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[64]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[65]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[66]  Alex Olshevsky,et al.  Linear Time Average Consensus on Fixed Graphs and Implications for Decentralized Optimization and Multi-Agent Control , 2014, 1411.4186.

[67]  Adrien B. Taylor,et al.  Exact Worst-Case Performance of First-Order Methods for Composite Convex Optimization , 2015, SIAM J. Optim..

[68]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[69]  A. Juditsky,et al.  5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .

[70]  Jie Sun,et al.  Nonsmooth Algorithms and Nesterov's Smoothing Technique for Generalized Fermat-Torricelli Problems , 2014, SIAM J. Optim..

[71]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[72]  Dusan Jakovetic,et al.  A Unification and Generalization of Exact Distributed First-Order Methods , 2017, IEEE Transactions on Signal and Information Processing over Networks.

[73]  Gustavo L. Gilardoni,et al.  On Reaching a Consensus Using Degroot's Iterative Pooling , 1993 .