A flexible coordinate descent method

We present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill conditioned problems. We call the method Flexible Coordinate Descent (FCD). At each iteration of FCD, a block of coordinates is sampled randomly, a quadratic model is formed about that block and the model is minimized approximately/inexactly to determine the search direction. An inexpensive line search is then employed to ensure a monotonic decrease in the objective function and acceptance of large step sizes. We present several high probability iteration complexity results to show that convergence of FCD is guaranteed theoretically. Finally, we present numerical results on large-scale problems to demonstrate the practical performance of the method.

[1]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[4]  Gene H. Golub,et al.  Inexact Preconditioned Conjugate Gradient Method with Inner-Outer Iteration , 1999, SIAM J. Sci. Comput..

[5]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  R. Rockafellar,et al.  Nonsmooth Mechanics and Analysis , 2006 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[10]  E.J. Candes Compressive Sampling , 2022 .

[11]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[12]  R. Rockafellar,et al.  Nonsmooth mechanics and analysis : theoretical and numerical advances , 2006 .

[13]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[14]  P. Tseng,et al.  Block-Coordinate Gradient Descent Method for Linearly Constrained Nonsmooth Separable Optimization , 2009 .

[15]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[16]  Paul Tseng,et al.  A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training , 2010, Comput. Optim. Appl..

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[19]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[20]  Robert Tibshirani,et al.  STANDARDIZATION AND THE GROUP LASSO PENALTY. , 2012, Statistica Sinica.

[21]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[22]  Michael A. Saunders,et al.  Proximal Newton-type methods for convex optimization , 2012, NIPS.

[23]  Katya Scheinberg,et al.  Noname manuscript No. (will be inserted by the editor) Efficient Block-coordinate Descent Algorithms for the Group Lasso , 2022 .

[24]  J. Nocedal,et al.  An inexact successive quadratic approximation method for L-1 regularized optimization , 2013, Mathematical Programming.

[25]  Y. Nesterov,et al.  Intermediate gradient methods for smooth convex problems with inexact oracle , 2013 .

[26]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[27]  Marco Sciandrone,et al.  Continuous Optimization On the convergence of inexact block coordinate descent methods for constrained optimization , 2013 .

[28]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[29]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[30]  Ion Necoara,et al.  A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints , 2013, Comput. Optim. Appl..

[31]  Francisco Facchinei,et al.  Flexible parallel algorithms for big data optimization , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[33]  Francisco Facchinei,et al.  Hybrid Random/Deterministic Parallel Algorithms for Nonconvex Big Data Optimization , 2014, ArXiv.

[34]  Zhi-Quan Luo,et al.  Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization , 2014, NIPS.

[35]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[36]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[37]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[38]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[39]  Francisco Facchinei,et al.  Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[40]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[41]  Performance of First- and Second-Order Methods for Big Data Optimization , 2015 .

[42]  Jorge Nocedal,et al.  An inexact successive quadratic approximation method for L-1 regularized optimization , 2016, Math. Program..

[43]  Peter Richtárik,et al.  Inexact Coordinate Descent: Complexity and Preconditioning , 2013, J. Optim. Theory Appl..

[44]  Peter Richtárik,et al.  SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization , 2015, ICML.

[45]  Jacek Gondzio,et al.  Performance of first- and second-order methods for -regularized least squares problems , 2016 .

[46]  Jacek Gondzio,et al.  Performance of first- and second-order methods for ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{doc , 2015, Computational Optimization and Applications.

[47]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[48]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[49]  Stefano Lucidi,et al.  A Fast Active Set Block Coordinate Descent Algorithm for ℓ1-Regularized Least Squares , 2014, SIAM J. Optim..

[50]  Jacek Gondzio,et al.  A second-order method for strongly convex ℓ1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _1$$\end{document}-re , 2013, Mathematical Programming.

[51]  Stephen A. Vavasis,et al.  IMRO: A Proximal Quasi-Newton Method for Solving ℓ1-Regularized Least Squares Problems , 2014, SIAM J. Optim..

[52]  Peter Richtárik,et al.  On the complexity of parallel coordinate descent , 2015, Optim. Methods Softw..