论文信息 - Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums - 字舞流文

Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

Structured nonsmooth convex finite-sum optimization appears in many machine learning applications, including support vector machines and least absolute deviation. For the primaldual formulation of this problem, we propose a novel algorithm called Variance Reduction via Primal-Dual Accelerated Dual Averaging (VRPDA2). In the nonsmooth and general convex setting, VRPDA2 has the overall complexity O(nd logmin{1/✏, n}+ d/✏) in terms of the primal-dual gap, where n denotes the number of samples, d the dimension of the primal variables, and ✏ the desired accuracy. In the nonsmooth and strongly convex setting, the overall complexity of VRPDA2 becomes O(nd logmin{1/✏, n}+d/ p ✏) in terms of both the primal-dual gap and the distance between iterate and optimal solution. Both these results for VRPDA2 improve significantly on state-of-the-art complexity estimates— which are O(nd logmin{1/✏, n} + p nd/✏) for the nonsmooth and general convex setting and O(nd logmin{1/✏, n}+ p nd/ p ✏) for the nonsmooth and strongly convex setting—with a simpler and more straightforward algorithm and analysis. Moreover, both complexities are better than lower bounds for general convex finite-sum optimization, because our approach makes use of additional, commonly occurring structure. Numerical experiments reveal competitive performance of VRPDA2 compared to state-of-the-art approaches.

Stephen J. Wright | Jelena Diakonikolas | Chaobing Song | Jelena Diakonikolas | Chaobing Song

[1] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2] Pascal Bianchi,et al. A Coordinate-Descent Primal-Dual Algorithm with Large Step Size and Possibly Nonseparable Functions , 2015, SIAM J. Optim..

[3] Antonin Chambolle,et al. Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..

[4] Guanghui Lan,et al. A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.

[5] Guanghui Lan,et al. Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[6] Yangyang Xu,et al. Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[7] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[8] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[9] Volkan Cevher,et al. Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization , 2017, NIPS.

[10] Adithya M. Devraj,et al. Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization , 2019, NeurIPS.

[11] Yurii Nesterov,et al. Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[12] Yong Jiang,et al. Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization , 2020, ArXiv.

[13] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets , 2012, ArXiv.

[14] Volkan Cevher,et al. A Smooth Primal-Dual Optimization Framework for Nonsmooth Composite Convex Minimization , 2015, SIAM J. Optim..

[15] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[16] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[17] Lin Xiao,et al. An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[18] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[19] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[20] Zeyuan Allen-Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[21] Alexandros G. Dimakis,et al. Primal-Dual Block Generalized Frank-Wolfe , 2019, NeurIPS.

[22] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.

[23] Shiqian Ma,et al. Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[24] Yuchen Zhang,et al. Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[25] A. Belloni,et al. Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2010, 1009.5689.

[26] Yurii Nesterov,et al. Universal gradient methods for convex optimization problems , 2015, Math. Program..

[27] Yunmei Chen,et al. Accelerated schemes for a class of variational inequalities , 2014, Mathematical Programming.

[28] Fanhua Shang,et al. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.

[29] Panagiotis Patrinos,et al. A New Randomized Block-Coordinate Primal-Dual Proximal Algorithm for Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[30] Kevin Tian,et al. Variance Reduction for Matrix Games , 2019, NeurIPS.

[31] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[32] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[33] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[34] Zeyuan Allen Zhu,et al. Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[35] Wotao Yin,et al. Breaking the Span Assumption Yields Fast Finite-Sum Minimization , 2018, NeurIPS.

[36] Volkan Cevher,et al. Random extrapolation for primal-dual coordinate descent , 2020, ICML.

[37] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[38] Michael I. Jordan,et al. Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[39] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[40] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[41] Zhengyuan Zhou,et al. Optimistic Dual Extrapolation for Coherent Non-monotone Variational Inequalities , 2021, NeurIPS.

[42] Jelena Diakonikolas. Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities , 2020, COLT.

[43] M. KarthyekRajhaaA.,et al. Robust Wasserstein profile inference and applications to machine learning , 2019, J. Appl. Probab..