论文信息 - Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms - 字舞流文

Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

We consider empirical risk minimization of linear predictors with convex loss functions. Such problems can be reformulated as convex-concave saddle point problems, and thus are well suitable for primal-dual first-order algorithms. However, primal-dual algorithms often require explicit strongly convex regularization in order to obtain fast linear convergence, and the required dual proximal mapping may not admit closed-form or efficient solution. In this paper, we develop both batch and randomized primal-dual algorithms that can exploit strong convexity from data adaptively and are capable of achieving linear convergence even without regularization. We also present dual-free variants of the adaptive primal-dual algorithms that do not require computing the dual proximal mapping, which are especially suitable for logistic regression.

Lin Xiao | Jialei Wang | Jialei Wang | Lin Xiao

[1] D. Bertsekas,et al. Incremental subgradient methods for nondifferentiable optimization , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[2] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[4] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[5] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[6] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[7] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8] Xiaoming Yuan,et al. Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems , 2013, 1305.0546.

[9] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[10] Peter Richtárik,et al. Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[11] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[12] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[13] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[14] Peter Richtárik,et al. Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[15] Lin Xiao,et al. An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[16] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[17] Dimitri P. Bertsekas,et al. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[18] Wotao Yin,et al. On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..

[19] Zeyuan Allen Zhu,et al. Katyusha: Accelerated Variance Reduction for Faster SGD , 2016, ArXiv.

[20] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[21] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[22] Shai Shalev-Shwartz,et al. SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[23] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[24] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[25] Thomas Pock,et al. A First-Order Primal-Dual Algorithm with Linesearch , 2016, SIAM J. Optim..