论文信息 - A Universal Catalyst for First-Order Optimization - 字舞流文

A Universal Catalyst for First-Order Optimization

We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. Our approach consists of minimizing a convex objective by approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. This strategy applies to a large class of algorithms, including gradient descent, block coordinate descent, SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. In addition to theoretical speed-up, we also show that acceleration is useful in practice, especially for ill-conditioned problems where we measure significant improvements.

Zaïd Harchaoui | Julien Mairal | Hongzhou Lin | Z. Harchaoui | J. Mairal | Hongzhou Lin | Zaïd Harchaoui

[1] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[2] Osman Güler,et al. New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[3] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[4] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[5] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[7] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[8] Saverio Salzo,et al. Inexact and accelerated proximal point algorithms , 2011 .

[9] Mark W. Schmidt,et al. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[10] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[11] Julien Mairal,et al. Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[12] Tong Zhang,et al. Proximal Stochastic Dual Coordinate Ascent , 2012, ArXiv.

[13] Bingsheng He,et al. An Accelerated Inexact Proximal Point Algorithm for Convex Minimization , 2012, J. Optim. Theory Appl..

[14] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[15] Justin Domke,et al. Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[16] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[17] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[18] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[19] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..

[20] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[21] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[22] Un-regularizing: approximate proximal point algorithms for empirical risk minimization , 2015 .

[23] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[24] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[25] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .

[26] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[27] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.