A Primer on Coordinate Descent Algorithms

This monograph presents a class of algorithms called coordinate descent algorithms for mathematicians, statisticians, and engineers outside the field of optimization. This particular class of algorithms has recently gained popularity due to their effectiveness in solving large-scale optimization problems in machine learning, compressed sensing, image processing, and computational statistics. Coordinate descent algorithms solve optimization problems by successively minimizing along each coordinate or coordinate hyperplane, which is ideal for parallelized and distributed computing. Avoiding detailed technicalities and proofs, this monograph gives relevant theory and examples for practitioners to effectively apply coordinate descent to modern problems in data science and engineering.

[1]  Luigi Grippo,et al.  On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..

[2]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[3]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[4]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[5]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[6]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[7]  Peter Richtárik,et al.  Stochastic Dual Coordinate Ascent with Adaptive Probabilities , 2015, ICML.

[8]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[9]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[10]  Eric C. Chi,et al.  A Brief Survey of Modern Optimization for Statisticians , 2014, International statistical review = Revue internationale de statistique.

[11]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[12]  Wotao Yin,et al.  TMAC: A Toolbox of Modern Async-Parallel, Coordinate, Splitting, and Stochastic Methods , 2016 .

[13]  Katya Scheinberg,et al.  Block Coordinate Descent Methods for Semidefinite Programming , 2012 .

[14]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[15]  Norman Zadeh Note---A Note on the Cyclic Coordinate Ascent Method , 1970 .

[16]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Ion Necoara,et al.  Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization , 2013, Journal of Global Optimization.

[19]  A. Auslender Asymptotic properties of the fenchel dual functional and applications to decomposition problems , 1992 .

[20]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[21]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[22]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[23]  D. D'Esopo,et al.  A convex programming procedure , 1959 .

[24]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[25]  Peter Richtárik,et al.  Distributed Block Coordinate Descent for Minimizing Partially Separable Functions , 2014, 1406.0238.

[26]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[27]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[28]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[29]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[30]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[31]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[32]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  André Uschmajew,et al.  On Convergence of the Maximum Block Improvement Method , 2015, SIAM J. Optim..

[35]  Inderjit S. Dhillon,et al.  PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent , 2015, ICML.

[36]  Wotao Yin,et al.  On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[37]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[38]  Qing Tao,et al.  Stochastic Coordinate Descent Methods for Regularized Smooth and Nonsmooth Losses , 2012, ECML/PKDD.

[39]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[40]  S. Bonettini Inexact block coordinate descent methods with application to non-negative matrix factorization , 2011 .

[41]  Ion Necoara,et al.  Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC , 2013, 1302.3092.

[42]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[43]  J. Pesquet,et al.  A Class of Randomized Primal-Dual Algorithms for Distributed Optimization , 2014, 1406.6404.

[44]  Ming Yan,et al.  Coordinate Friendly Structures, Algorithms and Applications , 2016, ArXiv.

[45]  Ming Yan,et al.  ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates , 2015, SIAM J. Sci. Comput..

[46]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[47]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[48]  Peter Richtárik,et al.  Coordinate descent with arbitrary sampling II: expected separable overapproximation , 2014, Optim. Methods Softw..

[49]  Richard G. Baraniuk,et al.  Sparse Bilinear Logistic Regression , 2014, ArXiv.

[50]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.

[51]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[52]  S. Osher,et al.  Coordinate descent optimization for l 1 minimization with application to compressed sensing; a greedy algorithm , 2009 .

[53]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[54]  Peter Richtárik,et al.  Coordinate descent with arbitrary sampling I: algorithms and complexity† , 2014, Optim. Methods Softw..

[55]  M. J. D. Powell,et al.  On search directions for minimization algorithms , 1973, Math. Program..

[56]  Tommi S. Jaakkola,et al.  Convergence Rate Analysis of MAP Coordinate Minimization Algorithms , 2012, NIPS.

[57]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[58]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[59]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[60]  R. V. Southwell Relaxation Methods In Engineering Science - A Treatise On Approximate Computation , 2010 .

[61]  Yangyang Xu,et al.  Alternating proximal gradient method for sparse nonnegative Tucker decomposition , 2013, Mathematical Programming Computation.

[62]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[63]  Thomas Hofmann,et al.  Communication-Efficient Distributed Dual Coordinate Ascent , 2014, NIPS.

[64]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[65]  Yangyang Xu,et al.  Randomized Primal–Dual Proximal Block Coordinate Updates , 2016, Journal of the Operations Research Society of China.

[66]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[67]  Shuzhong Zhang,et al.  Maximum Block Improvement and Polynomial Optimization , 2012, SIAM J. Optim..

[68]  Rodney X. Sturdivant,et al.  Introduction to the Logistic Regression Model , 2005 .

[69]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[70]  Kim-Chuan Toh,et al.  A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..

[71]  Peter Richtárik,et al.  On the complexity of parallel coordinate descent , 2015, Optim. Methods Softw..

[72]  S. Shalev-Shwartz,et al.  Stochastic methods for {\it l}$_{\mbox{1}}$ regularized loss minimization , 2009, ICML 2009.

[73]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[74]  Peter Richtárik,et al.  Smooth minimization of nonsmooth functions with parallel coordinate descent methods , 2013, Modeling and Optimization: Theory and Applications.

[75]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[76]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[77]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[78]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[79]  Paul Tseng,et al.  A block coordinate gradient descent method for regularized convex separable optimization and covariance selection , 2011, Math. Program..

[80]  Ming Yan,et al.  Parallel and distributed sparse optimization , 2013, 2013 Asilomar Conference on Signals, Systems and Computers.

[81]  Patrick L. Combettes,et al.  Proximal Thresholding Algorithm for Minimization over Orthonormal Bases , 2007, SIAM J. Optim..

[82]  J. Warga Minimizing Certain Convex Functions , 1963 .

[83]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[84]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[85]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[86]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[87]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[88]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[89]  Witold Pedrycz,et al.  Global and local structure preserving sparse subspace learning: An iterative approach to unsupervised feature selection , 2015, Pattern Recognit..

[90]  P. Tseng Dual ascent methods for problems with strictly convex costs and linear constraints: a unified approach , 1990 .

[91]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[92]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[93]  D. Donoho,et al.  Basis pursuit , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[94]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..