On the Efficiency of Random Permutation for ADMM and Coordinate Descent

Random permutation is observed to be powerful for optimization algorithms: for multi-block ADMM (alternating direction method of multipliers), while the classical cyclic version divergence, the randomly permuted version converges in practice; for BCD (block coordinate descent), the randomly permuted version is typically faster than other versions. In this paper, we provide strong theoretical evidence that random permutation has positive effects on ADMM and BCD, by analyzing randomly permuted ADMM (RP-ADMM) for solving linear systems of equations, and randomly permuted BCD (RP-BCD) for solving unconstrained quadratic problems. First, we prove that RP-ADMM converges in expectation for solving systems of linear equations. The key technical result is that the spectrum of the expected update matrix of RP-BCD lies in $(-1/3, 1)$, instead of the typical range $(-1, 1)$. Second, we establish expected convergence rates of RP-ADMM for solving linear sytems and RP-BCD for solving unconstrained quadratic problems. This expected rate of RP-BCD is $O(n)$ times better than the worst-case rate of cyclic BCD, thus establishing a gap of at least $O(n)$ between RP-BCD and cyclic BCD. To analyze RP-BCD, we propose a conjecture of a new matrix AM-GM (algebraic mean-geometric mean) inequality, and prove a weaker version of it.

[1]  W. G. Strang Eigenvalues of Jordan Products , 1962 .

[2]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[3]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[4]  R. Glowinski,et al.  Finite element approximation and iterative solution of a class of mildly non-linear elliptic equations , 1978 .

[5]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[6]  F. Kittaneh Spectral radius inequalities for Hilbert space operators , 2005 .

[7]  Donald Goldfarb,et al.  l2-PENALTY METHODS FOR NONLINEAR PROGRAMMING WITH STRONG GLOBAL CONVERGENCE PROPERTIES , 2004 .

[8]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[9]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[10]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[11]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[12]  Xiaoming Yuan,et al.  Convergence rate and iteration complexity on the alternating direction method of multipliers with a substitution procedure for separable convex programming , 2012 .

[13]  B. Recht,et al.  Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences , 2012, 1202.4184.

[14]  Xiaoming Yuan,et al.  A Note on the Alternating Direction Method of Multipliers , 2012, J. Optim. Theory Appl..

[15]  Bingsheng He,et al.  Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming , 2011 .

[16]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[17]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[18]  Caihua Chen,et al.  On the Convergence Analysis of the Alternating Direction Method of Multipliers with Three Blocks , 2013 .

[19]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[20]  Xiaoming Yuan,et al.  The direct extension of ADMM for three-block separable convex minimization models is convergent when one function is strongly convex , 2014 .

[21]  Tianyi Lin,et al.  On the Convergence Rate of Multi-Block ADMM , 2014 .

[22]  Kim-Chuan Toh,et al.  A Convergent Proximal Alternating Direction Method of Multipliers for Conic Programming with 4-Block Constraints , 2014 .

[23]  Xiaoming Yuan,et al.  An augmented Lagrangian based parallel splitting method for separable convex minimization with applications to image processing , 2014, Math. Comput..

[24]  Zhi-Quan Luo,et al.  Parallel Direction Method of Multipliers , 2014, NIPS.

[25]  Kim-Chuan Toh,et al.  A Convergent 3-Block Semi-Proximal ADMM for Convex Minimization Problems with One Strongly Convex Block , 2014, Asia Pac. J. Oper. Res..

[26]  Bingsheng He,et al.  On Full Jacobian Decomposition of the Augmented Lagrangian Method for Separable Convex Programming , 2015, SIAM J. Optim..

[27]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[28]  Y. Ye,et al.  On the Expected Convergence of Randomly Permuted ADMM , 2015 .

[29]  Mingyi Hong,et al.  Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems , 2015, NIPS.

[30]  Damek Davis,et al.  A Three-Operator Splitting Scheme and its Optimization Applications , 2015, 1504.01032.

[31]  Shiqian Ma,et al.  On the Global Linear Convergence of the ADMM with MultiBlock Variables , 2014, SIAM J. Optim..

[32]  Ruoyu Sun Matrix Completion via Nonconvex Factorization: Algorithms and Theory , 2015 .

[33]  Shiqian Ma,et al.  Iteration Complexity Analysis of Multi-block ADMM for a Family of Convex Minimization Without Strong Convexity , 2015, Journal of Scientific Computing.

[34]  Kim-Chuan Toh,et al.  A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions , 2014, Mathematical Programming.

[35]  Bingsheng He,et al.  On the Proximal Jacobian Decomposition of ALM for Multiple-Block Separable Convex Minimization Problems and Its Relationship to ADMM , 2016, J. Sci. Comput..

[36]  Wotao Yin,et al.  A Primer on Coordinate Descent Algorithms , 2016, 1610.00040.

[37]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.

[38]  Bingsheng He,et al.  Convergence Rate Analysis for the Alternating Direction Method of Multipliers with a Substitution Procedure for Separable Convex Programming , 2017, Math. Oper. Res..

[39]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[40]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[41]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[42]  Stephen J. Wright,et al.  Random permutations fix a worst case for cyclic coordinate descent , 2016, IMA Journal of Numerical Analysis.

[43]  Caihua Chen,et al.  Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: convergence analysis and insights , 2015, Mathematical Programming.

[44]  Shiqian Ma,et al.  A Block Successive Upper-Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization , 2014, Math. Oper. Res..

[45]  Stephen J. Wright,et al.  Analyzing random permutations for cyclic coordinate descent , 2020, Math. Comput..

[46]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[47]  Ruoyu Sun,et al.  Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version , 2016, Math. Program..