A2BCD: An Asynchronous Accelerated Block Coordinate Descent Algorithm With Optimal Complexity

In this paper, we propose the Asynchronous Accelerated Nonuniform Randomized Block Coordinate Descent algorithm (A2BCD), the first asynchronous Nesterov-accelerated algorithm that achieves optimal complexity. This parallel algorithm solves the unconstrained convex minimization problem, using p computing nodes which compute updates to shared solution vectors, in an asynchronous fashion with no central coordination. Nodes in asynchronous algorithms do not wait for updates from other nodes before starting a new iteration, but simply compute updates using the most recent solution information available. This allows them to complete iterations much faster than traditional ones, especially at scale, by eliminating the costly synchronization penalty of traditional algorithms. We first prove that A2BCD converges linearly to a solution with a fast accelerated rate that matches the recently proposed NU_ACDM, so long as the maximum delay is not too large. Somewhat surprisingly, A2BCD pays no complexity penalty for using outdated information. We then prove lower complexity bounds for randomized coordinate descent methods, which show that A2BCD (and hence NU_ACDM) has optimal complexity to within a constant factor. We confirm with numerical experiments that A2BCD outperforms NU_ACDM, which is the current fastest coordinate descent algorithm, even at small scale. We also derive and analyze a second-order ordinary differential equation, which is the continuous-time limit of our algorithm, and prove it converges linearly to a solution with a similar accelerated rate.

[1]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[2]  D. Bertsekas,et al.  Partially asynchronous, parallel algorithms for network flow and other problems , 1990 .

[3]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[4]  Wotao Yin,et al.  Asynchronous Coordinate Descent under More Realistic Assumptions , 2017, NIPS.

[5]  Francisco Facchinei,et al.  Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization. Part II: Complexity and Numerical Results , 2017, 1701.04900.

[6]  Alexander J. Smola,et al.  On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Weizhu Chen,et al.  DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..

[9]  Paul Tseng,et al.  On the Rate of Convergence of a Partially Asynchronous Gradient Projection Algorithm , 1991, SIAM J. Optim..

[10]  Paul Tseng,et al.  On the Convergence Rate of Dual Ascent Methods for Linearly Constrained Convex Minimization , 1993, Math. Oper. Res..

[11]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[12]  Florentin Smarandache,et al.  Algorithms for Solving Linear Congruences and Systems of Linear Congruences , 2007, math/0702488.

[13]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[14]  Haim Avron,et al.  Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization , 2014, IPDPS.

[15]  Yossi Arjevani,et al.  Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization , 2017, NIPS.

[16]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[17]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[18]  Zeyuan Allen Zhu,et al.  Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[19]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[20]  Ming Yan,et al.  ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[21]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[22]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[23]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[24]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[25]  Fabian Pedregosa,et al.  ASAGA: Asynchronous Parallel SAGA , 2016, AISTATS.