Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

Structured nonsmooth convex finite-sum optimization appears in many machine learning applications, including support vector machines and least absolute deviation. For the primaldual formulation of this problem, we propose a novel algorithm called Variance Reduction via Primal-Dual Accelerated Dual Averaging (VRPDA2). In the nonsmooth and general convex setting, VRPDA2 has the overall complexity O(nd logmin{1/✏, n}+ d/✏) in terms of the primal-dual gap, where n denotes the number of samples, d the dimension of the primal variables, and ✏ the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of VRPDA2 becomes O(nd logmin{1/✏, n}+d/ p ✏) in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for VRPDA2 improve significantly on state-of-the-art complexity estimates— which are O(nd logmin{1/✏, n} + p nd/✏) for the nonsmooth and general convex setting and O(nd logmin{1/✏, n}+ p nd/ p ✏) for the nonsmooth and strongly convex setting—with a simpler and more straightforward algorithm and analysis. Moreover, both complexities are better than lower bounds for general convex finite-sum optimization, because our approach makes use of additional, commonly occurring structure. Numerical experiments reveal competitive performance of VRPDA2 compared to state-of-the-art approaches.

[1]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2]  Pascal Bianchi,et al.  A Coordinate-Descent Primal-Dual Algorithm with Large Step Size and Possibly Nonseparable Functions , 2015, SIAM J. Optim..

[3]  Antonin Chambolle,et al.  Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..

[4]  Guanghui Lan,et al.  A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.

[5]  Guanghui Lan,et al.  Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[6]  Yangyang Xu,et al.  Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[7]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[8]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[9]  Volkan Cevher,et al.  Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization , 2017, NIPS.

[10]  Adithya M. Devraj,et al.  Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization , 2019, NeurIPS.

[11]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[12]  Yong Jiang,et al.  Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization , 2020, ArXiv.

[13]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[14]  Volkan Cevher,et al.  A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[15]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[16]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[17]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[18]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[19]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[20]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[21]  Alexandros G. Dimakis,et al.  Primal-Dual Block Generalized Frank-Wolfe , 2019, NeurIPS.

[22]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[23]  Shiqian Ma,et al.  Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[24]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[25]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[26]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[27]  Yunmei Chen,et al.  Accelerated schemes for a class of variational inequalities , 2014, Mathematical Programming.

[28]  Fanhua Shang,et al.  A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.

[29]  Panagiotis Patrinos,et al.  A New Randomized Block-Coordinate Primal-Dual Proximal Algorithm for Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[30]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[31]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[32]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[33]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[34]  Zeyuan Allen Zhu,et al.  Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[35]  Wotao Yin,et al.  Breaking the Span Assumption Yields Fast Finite-Sum Minimization , 2018, NeurIPS.

[36]  Volkan Cevher,et al.  Random extrapolation for primal-dual coordinate descent , 2020, ICML.

[37]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[38]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[39]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[40]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[41]  Zhengyuan Zhou,et al.  Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[42]  Jelena Diakonikolas Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[43]  M. KarthyekRajhaaA.,et al.  Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..