Asynchronous Coordinate Descent under More Realistic Assumptions

Asynchronous-parallel algorithms have the potential to vastly speed up algorithms by eliminating costly synchronization. However, our understanding to these algorithms is limited because the current convergence of asynchronous (block) coordinate descent algorithms are based on somewhat unrealistic assumptions. In particular, the age of the shared optimization variables being used to update a block is assumed to be independent of the block being updated. Also, it is assumed that the updates are applied to randomly chosen blocks. In this paper, we argue that these assumptions either fail to hold or will imply less efficient implementations. We then prove the convergence of asynchronous-parallel block coordinate descent under more realistic assumptions, in particular, always without the independence assumption. The analysis permits both the deterministic (essentially) cyclic and random rules for block choices. Because a bound on the asynchronous delays may or may not be available, we establish convergence for both bounded delays and unbounded delays. The analysis also covers nonconvex, weakly convex, and strongly convex functions. We construct Lyapunov functions that directly model both objective progress and delays, so delays are not treated errors or noise. A continuous-time ODE is provided to explain the construction at a high level.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Damek Davis The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems , 2016, 1604.00526.

[3]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[4]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[5]  Francisco Facchinei,et al.  Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization: Model and Convergence , 2016, ArXiv.

[6]  Ming Yan,et al.  ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[7]  Yangyang Xu,et al.  Asynchronous parallel primal-dual block update methods , 2017, ArXiv.

[8]  Kunle Olukotun,et al.  Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.

[9]  Wotao Yin,et al.  More Iterations per Second, Same Quality - Why Asynchronous Algorithms may Drastically Outperform Traditional Ones , 2017, ArXiv.

[10]  Yangyang Xu,et al.  Asynchronous parallel primal–dual block coordinate update methods for affinely constrained convex programs , 2017, Comput. Optim. Appl..

[11]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[12]  Ming Yan,et al.  On the Convergence of Asynchronous Parallel Iteration with Arbitrary Delays , 2016, ArXiv.

[13]  Francisco Facchinei,et al.  Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization. Part II: Complexity and Numerical Results , 2017, 1701.04900.

[14]  Fabian Pedregosa,et al.  ASAGA: Asynchronous Parallel SAGA , 2016, AISTATS.

[15]  Wotao Yin,et al.  Cyclic Coordinate-Update Algorithms for Fixed-Point Problems: Analysis and Applications , 2016, SIAM J. Sci. Comput..

[16]  Damek Davis,et al.  Convergence Rate Analysis of Several Splitting Schemes , 2014, 1406.4834.

[17]  Wotao Yin,et al.  Augmented 퓁1 and Nuclear-Norm Models with a Globally Linearly Convergent Algorithm , 2012, SIAM J. Imaging Sci..

[18]  Ruoyu Sun,et al.  Worst-case complexity of cyclic coordinate descent: $$O(n^2)$$ O ( n 2 ) , 2016, Mathematical Programming.

[19]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[20]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[21]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..