Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms

We consider empirical risk minimization of linear predictors with convex loss functions. Such problems can be reformulated as convex-concave saddle point problems, and thus are well suitable for primal-dual first-order algorithms. However, primal-dual algorithms often require explicit strongly convex regularization in order to obtain fast linear convergence, and the required dual proximal mapping may not admit closed-form or efficient solution. In this paper, we develop both batch and randomized primal-dual algorithms that can exploit strong convexity from data adaptively and are capable of achieving linear convergence even without regularization. We also present dual-free variants of the adaptive primal-dual algorithms that do not require computing the dual proximal mapping, which are especially suitable for logistic regression.

[1]  D. Bertsekas,et al.  Incremental subgradient methods for nondifferentiable optimization , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[4]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[5]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[6]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[7]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8]  Xiaoming Yuan,et al.  Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems , 2013, 1305.0546.

[9]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[10]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[11]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[12]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[13]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[14]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[15]  Lin Xiao,et al.  An Accelerated Randomized Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2015, SIAM J. Optim..

[16]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[17]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[18]  Wotao Yin,et al.  On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..

[19]  Zeyuan Allen Zhu,et al.  Katyusha: Accelerated Variance Reduction for Faster SGD , 2016, ArXiv.

[20]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[21]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[22]  Shai Shalev-Shwartz,et al.  SDCA without Duality, Regularization, and Individual Convexity , 2016, ICML.

[23]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[24]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[25]  Thomas Pock,et al.  A First-Order Primal-Dual Algorithm with Linesearch , 2016, SIAM J. Optim..