论文信息 - Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE

We develop a distributed algorithm for convex Empirical Risk Minimization, the problem of minimizing large but finite sum of convex functions over networks. The proposed algorithm is derived from directly discretizing the second-order heavy-ball differential equation and results in an accelerated convergence rate, i.e., faster than distributed gradient descent-based methods for strongly convex objectives that may not be smooth. Notably, we achieve acceleration without resorting to the well-known Nesterov's momentum approach. We provide numerical experiments and contrast the proposed method with recently proposed optimal distributed optimization algorithms.

[1] Laurent Massoulié,et al. Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[2] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[3] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[4] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[5] Angelia Nedic,et al. Distributed Computation of Wasserstein Barycenters Over Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[6] Na Li,et al. Accelerated distributed Nesterov Gradient Descent for convex and smooth functions , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[7] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[8] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9] Haitham Bou-Ammar,et al. Distributed Newton Method for Large-Scale Consensus Optimization , 2016, IEEE Transactions on Automatic Control.

[10] Maxim Raginsky,et al. Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[11] E. Hairer,et al. Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .

[12] Aryan Mokhtari,et al. Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[13] Alejandro Ribeiro,et al. Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..

[14] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[15] José M. F. Moura,et al. Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[16] Jelena Diakonikolas,et al. Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[17] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[18] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[19] Angelia Nedić,et al. Fast Convergence Rates for Distributed Non-Bayesian Learning , 2015, IEEE Transactions on Automatic Control.

[20] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[21] Wei Shi,et al. Geometrically convergent distributed optimization with uncoordinated step-sizes , 2016, 2017 American Control Conference (ACC).

[22] Qing Ling,et al. On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[23] Aryan Mokhtari,et al. DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers , 2016, IEEE Transactions on Signal Processing.

[24] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[25] Na Li,et al. Accelerated Distributed Nesterov Gradient Descent , 2017, IEEE Transactions on Automatic Control.

[26] Angelia Nedic,et al. A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks , 2018, 2020 Information Theory and Applications Workshop (ITA).

[27] Qing Ling,et al. On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[28] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[29] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[30] Yi Zhou,et al. Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[31] Asuman E. Ozdaglar,et al. On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[32] Asuman E. Ozdaglar,et al. Convergence Rate of Distributed ADMM Over Networks , 2016, IEEE Transactions on Automatic Control.