Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

It was recently found that the standard version of multi-block cyclic ADMM diverges. Interestingly, Gaussian Back Substitution ADMM (GBS-ADMM) and symmetric Gauss-Seidel ADMM (sGS-ADMM) do not have the divergence issue. Therefore, it seems that symmetrization can improve the performance of the classical cyclic order. In another recent work, cyclic CD (Coordinate Descent) was shown to be $\mathcal{O}(n^2)$ times slower than randomized versions in the worst-case. A natural question arises: can the symmetrized orders achieve a faster convergence rate than the cyclic order, or even getting close to randomized versions? In this paper, we give a negative answer to this question. We show that both Gaussian Back Substitution and symmetric Gauss-Seidel order suffer from the same slow convergence issue as the cyclic order in the worst case. In particular, we prove that for unconstrained problems, they can be $\mathcal{O}(n^2)$ times slower than R-CD. For linearly constrained problems with quadratic objective, we empirically show the convergence speed of GBS-ADMM and sGS-ADMM can be roughly $\mathcal{O}(n^2)$ times slower than randomly permuted ADMM.

[1]  Peter Richtárik,et al.  Randomized Dual Coordinate Ascent with Arbitrary Sampling , 2014, ArXiv.

[2]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[4]  Kim-Chuan Toh,et al.  A Convergent Proximal Alternating Direction Method of Multipliers for Conic Programming with 4-Block Constraints , 2014 .

[5]  Xiaoming Yuan,et al.  An augmented Lagrangian based parallel splitting method for separable convex minimization with applications to image processing , 2014, Math. Comput..

[6]  S. Sra Explicit diagonalization of an anti-triangular Cesar\'o matrix , 2014, 1411.4107.

[7]  B. Recht,et al.  Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences , 2012, 1202.4184.

[8]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[9]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[10]  Shiqian Ma,et al.  On the Global Linear Convergence of the ADMM with MultiBlock Variables , 2014, SIAM J. Optim..

[11]  Panos M. Pardalos,et al.  Encyclopedia of Optimization , 2006 .

[12]  Zhi-Quan Luo,et al.  An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[14]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[15]  Stephen J. Wright,et al.  Randomness and permutations in coordinate descent methods , 2018, Math. Program..

[16]  Tianyi Lin,et al.  On the Convergence Rate of Multi-Block ADMM , 2014 .

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Kim-Chuan Toh,et al.  A Convergent 3-Block Semi-Proximal ADMM for Convex Minimization Problems with One Strongly Convex Block , 2014, Asia Pac. J. Oper. Res..

[19]  Bingsheng He,et al.  On the Proximal Jacobian Decomposition of ALM for Multiple-Block Separable Convex Minimization Problems and Its Relationship to ADMM , 2016, J. Sci. Comput..

[20]  R. Glowinski,et al.  Finite element approximation and iterative solution of a class of mildly non-linear elliptic equations , 1978 .

[21]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[22]  Bingsheng He,et al.  On Full Jacobian Decomposition of the Augmented Lagrangian Method for Separable Convex Programming , 2015, SIAM J. Optim..

[23]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[24]  Xiaoming Yuan,et al.  Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming , 2012 .

[25]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[26]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[27]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[28]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[29]  Ion Necoara,et al.  Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[30]  Ruoyu Sun,et al.  Worst-case complexity of cyclic coordinate descent: $$O(n^2)$$ O ( n 2 ) , 2016, Mathematical Programming.

[31]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[32]  Stephen J. Wright,et al.  Random permutations fix a worst case for cyclic coordinate descent , 2016, IMA Journal of Numerical Analysis.

[33]  Kim-Chuan Toh,et al.  A block symmetric Gauss–Seidel decomposition theorem for convex composite quadratic programming and its applications , 2017, Mathematical Programming.

[34]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[35]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[36]  Kim-Chuan Toh,et al.  A Convergent 3-Block SemiProximal Alternating Direction Method of Multipliers for Conic Programming with 4-Type Constraints , 2014, SIAM J. Optim..

[37]  Caihua Chen,et al.  Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights , 2015, Mathematical Programming.

[38]  Xiaoming Yuan,et al.  The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex , 2014 .

[39]  Tianyi Lin,et al.  On the Convergence Rate of Multi-Block ADMM , 2014, 1408.4265.

[40]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[41]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[42]  Stephen J. Wright,et al.  Analyzing random permutations for cyclic coordinate descent , 2020, Math. Comput..

[43]  Shiqian Ma,et al.  Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity , 2015, Journal of Scientific Computing.

[44]  Kim-Chuan Toh,et al.  An efficient inexact symmetric Gauss–Seidel based majorized ADMM for high-dimensional convex composite conic programming , 2015, Mathematical Programming.

[45]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[46]  Carl C. Cowen,et al.  Triangular truncation and finding the norm of a Hadamard multiplier , 1992 .

[47]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[48]  Zhi-Quan Luo,et al.  On the Efficiency of Random Permutation for ADMM and Coordinate Descent , 2015, Math. Oper. Res..

[49]  Shiqian Ma,et al.  A Block Successive Upper-Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization , 2014, Math. Oper. Res..

[50]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[51]  Bingsheng He,et al.  Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming , 2011 .

[52]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[53]  Caihua Chen,et al.  On the Convergence Analysis of the Alternating Direction Method of Multipliers with Three Blocks , 2013 .

[54]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[55]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[56]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[57]  Kim-Chuan Toh,et al.  A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions , 2014, Mathematical Programming.

[58]  Xiaoming Yuan,et al.  A Note on the Alternating Direction Method of Multipliers , 2012, J. Optim. Theory Appl..

[59]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[60]  Inderjit S. Dhillon,et al.  PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[61]  Y. Ye,et al.  On the Convergence of Multi-Block Alternating Direction Method of Multipliers and Block Coordinate Descent Method , 2015 .