Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version

This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss-Seidel method and can be transformed to Kaczmarz method and projection onto convex sets (POCS). We observe that the known provable complexity of C-CD can be $O(n^2)$ times slower than randomized coordinate descent (R-CD), but no example was rigorously proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least $O(n^4 \kappa_{\text{CD}} \log\frac{1}{\epsilon})$ operations, where $\kappa_{\text{CD}}$ is related to Demmel's condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be $O(n^2)$ times slower than R-CD, which has complexity $O( n^2 \kappa_{\text{CD}} \log\frac{1}{\epsilon})$. Note that for this example, the gap exists for any fixed update order, not just a particular order. Based on the example, we establish several almost tight complexity bounds of C-CD for quadratic problems. One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a \textit{lower bound} for the convergence rate. An immediate consequence is that for Gauss-Seidel method, Kaczmarz method and POCS, there is also an $O(n^2) $ gap between the cyclic versions and randomized versions (for solving linear systems). We also show that the classical convergence rate of POCS by Smith, Solmon and Wager [1] is always worse and sometimes can be infinitely times worse than our bound.

[1]  Zhaosong Lu,et al.  An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[2]  Zhi-Quan Luo,et al.  Cross-Layer Provision of Future Cellular Networks: A WMMSE-based approach , 2014, IEEE Signal Processing Magazine.

[3]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[4]  Hein Hundal,et al.  The Rate of Convergence for the Method of Alternating Projections, II , 1997 .

[5]  A. Galántai Projectors and Projection Methods , 2003 .

[6]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[7]  J. Neumann On Rings of Operators. Reduction Theory , 1949 .

[8]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[9]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[10]  G. Forsythe,et al.  On best conditioned matrices , 1955 .

[11]  Zeyuan Allen Zhu,et al.  Nearly-Linear Time Positive LP Solver with Faster Convergence Rate , 2015, STOC.

[12]  Zhi-Quan Luo,et al.  Joint Base Station Clustering and Beamformer Design for Partial Coordinated Transmission in Heterogeneous Networks , 2012, IEEE Journal on Selected Areas in Communications.

[13]  D. Spielman,et al.  Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time , 2004 .

[14]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[15]  Mingyi Hong,et al.  Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems , 2015, NIPS.

[16]  Weizhu Chen,et al.  DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization , 2017, J. Mach. Learn. Res..

[17]  Stephen J. Wright,et al.  Random permutations fix a worst case for cyclic coordinate descent , 2016, IMA Journal of Numerical Analysis.

[18]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[19]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[20]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[21]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[22]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[23]  Peter Richtárik,et al.  Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2014, ArXiv.

[24]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[27]  Ion Necoara,et al.  Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[28]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[29]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .

[30]  M. Raydan,et al.  Alternating Projection Methods , 2011 .

[31]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[32]  Gary L. Miller,et al.  A ug 2 01 0 Approaching optimality for solving SDD linear systems ∗ , 2011 .

[33]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[34]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[35]  Zhi-Quan Luo,et al.  Long-term transmit point association for coordinated multipoint transmission by stochastic optimization , 2013, 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[36]  Howard L. Weinert,et al.  Error bounds for the method of alternating projections , 1988, Math. Control. Signals Syst..

[37]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[38]  Tianyi Lin,et al.  On the Convergence Rate of Multi-Block ADMM , 2014, 1408.4265.

[39]  Kennan T. Smith,et al.  Practical and mathematical aspects of the problem of reconstructing objects from radiographs , 1977 .

[40]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[41]  Shang-Hua Teng,et al.  Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, Diagonally Dominant Linear Systems , 2006, SIAM J. Matrix Anal. Appl..

[42]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Nonconvex Factorization , 2015, FOCS.

[43]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[44]  P. Oswald On the convergence rate of SOR: A worst case estimate , 2005, Computing.

[45]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[46]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[47]  H. Zou,et al.  A coordinate majorization descent algorithm for ℓ1 penalized learning , 2014 .

[48]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[49]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[50]  Inderjit S. Dhillon,et al.  PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[51]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.