A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

In this paper we propose a variant of the random coordinate descent method for solving linearly constrained convex optimization problems with composite objective functions. If the smooth part of the objective function has Lipschitz continuous gradient, then we prove that our method obtains an ϵ-optimal solution in $\mathcal{O}(n^{2}/\epsilon)$ iterations, where n is the number of blocks. For the class of problems with cheap coordinate derivatives we show that the new method is faster than methods based on full-gradient information. Analysis for the rate of convergence in probability is also provided. For strongly convex functions our method converges linearly. Extensive numerical tests confirm that on very large problems, our method is much more numerically efficient than methods based on full gradient information.

[1]  A. Barrett Network Flows and Monotropic Optimization. , 1984 .

[2]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[3]  P. Berman,et al.  Algorithms for the Least Distance Problem , 1993 .

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[8]  Jie Sun,et al.  Solution Methodologies for the Smallest Enclosing Circle Problem , 2003, Comput. Optim. Appl..

[9]  Don R. Hush,et al.  Polynomial-Time Decomposition Algorithms for Support Vector Machines , 2003, Machine Learning.

[10]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[11]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[12]  Roger Fletcher,et al.  New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds , 2006, Math. Program..

[13]  Don R. Hush,et al.  QP Algorithms with Guaranteed Accuracy and Run Time for Support Vector Machines , 2006, J. Mach. Learn. Res..

[14]  Stephen P. Boyd,et al.  Optimal Scaling of a Gradient Method for Distributed Resource Allocation , 2006 .

[15]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[16]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[17]  K. Kiwiel On Linear-Time Algorithms for the Continuous Quadratic Knapsack Problem , 2007 .

[18]  Hans Ulrich Simon,et al.  General Polynomial Time Decomposition Algorithms , 2005, J. Mach. Learn. Res..

[19]  Joaquim Júdice,et al.  On the solution of the symmetric eigenvalue complementarity problem by the spectral projected gradient algorithm , 2008, Numerical Algorithms.

[20]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[21]  S. Lucidi,et al.  Decomposition Algorithm Model for Singly Linearly-Constrained Problems Subject to Lower and Upper Bounds , 2009 .

[22]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[23]  Paul Tseng,et al.  A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training , 2010, Comput. Optim. Appl..

[24]  Ambuj Tewari,et al.  On the Finite Time Convergence of Cyclic Coordinate Descent Methods , 2010, ArXiv.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Ion Necoara,et al.  Parallel and distributed optimization methods for estimation and control in networks , 2011, 1302.3103.

[27]  Peter Richtárik,et al.  Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Truss Topology Design , 2011, OR.

[28]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[29]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[30]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[31]  Y. Nesterov,et al.  A RANDOM COORDINATE DESCENT METHOD ON LARGE-SCALE OPTIMIZATION PROBLEMS WITH LINEAR CONSTRAINTS , 2013 .

[32]  Ion Necoara,et al.  Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[33]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[34]  Ion Necoara,et al.  Random Coordinate Descent Algorithms for Multi-Agent Convex Optimization Over Networks , 2013, IEEE Transactions on Automatic Control.

[35]  Yurii Nesterov,et al.  Primal-Dual Subgradient Method for Huge-Scale Linear Conic Problems , 2014, SIAM J. Optim..

[36]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[37]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[38]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.