Efficient Greedy Coordinate Descent for Composite Problems

Coordinate descent with random coordinate selection is the current state of the art for many large scale optimization problems. However, greedy selection of the steepest coordinate on smooth problems can yield convergence rates independent of the dimension $n$, requiring $n$ times fewer iterations. In this paper, we consider greedy updates that are based on subgradients for a class of non-smooth composite problems, including $L1$-regularized problems, SVMs and related applications. For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case. This was previously conjectured to be true for a stronger greedy coordinate selection strategy. Furthermore, we show that (ii) our new selection rule can be mapped to instances of maximum inner product search, allowing to leverage standard nearest neighbor algorithms to speed up the implementation. We demonstrate the validity of the approach through extensive numerical experiments.

[1]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[2]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[3]  Mark W. Schmidt,et al.  Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017 .

[4]  Martin Jaggi,et al.  Approximate Steepest Coordinate Descent , 2017, ICML.

[5]  Vahab S. Mirrokni,et al.  Accelerating Greedy Coordinate Descent Methods , 2018, ICML.

[6]  Martin J. Wainwright,et al.  Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods , 2012, NIPS.

[7]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[8]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[9]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[10]  R. V. Southwell Relaxation Methods In Engineering Science - A Treatise On Approximate Computation , 2010 .

[11]  Volkan Cevher,et al.  Faster Coordinate Descent via Adaptive Importance Sampling , 2017, AISTATS.

[12]  Alexandre Gramfort,et al.  GAP Safe screening rules for sparse multi-task and multi-class models , 2015, NIPS.

[13]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[14]  Alexandre Gramfort,et al.  Celer: a Fast Solver for the Lasso with Dual Extrapolation , 2018, ICML.

[15]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[16]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[17]  Gunnar Rätsch,et al.  On Matching Pursuit and Coordinate Descent , 2018, ICML.

[18]  Shai Shalev-Shwartz,et al.  Accelerated Mini-Batch Stochastic Dual Coordinate Ascent , 2013, NIPS.

[19]  Martin Jaggi,et al.  Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems , 2018, AISTATS.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Nathan Srebro,et al.  On Symmetric and Asymmetric LSHs for Inner Product Search , 2014, ICML.

[22]  Mark W. Schmidt,et al.  Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection , 2015, ICML.

[23]  Pradeep Ravikumar,et al.  Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization , 2017, ICML.

[24]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[25]  Leonid Boytsov,et al.  Non-Metric Space Library Manual , 2015, ArXiv.

[26]  John N. Tsitsiklis,et al.  Some aspects of parallel and distributed iterative algorithms - A survey, , 1991, Autom..

[27]  Yong Jiang,et al.  Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex , 2017, NIPS.

[28]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[29]  Martin Jaggi,et al.  Safe Adaptive Importance Sampling , 2017, NIPS.

[30]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[31]  Mark W. Schmidt,et al.  Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017, 1712.08859.

[32]  J. Warga Minimizing Certain Convex Functions , 1963 .

[33]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[34]  James Demmel,et al.  Asynchronous Parallel Greedy Coordinate Descent , 2016, NIPS.

[35]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[36]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[37]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[38]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[39]  Martin Jaggi,et al.  Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems , 2017, NIPS.

[40]  Pradeep Ravikumar,et al.  Nearest Neighbor based Greedy Coordinate Descent , 2011, NIPS.