论文信息 - Variance Reduction for Matrix Games - 字舞流文

Variance Reduction for Matrix Games

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $\epsilon \le \sqrt{n/\mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $\ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovski's "conceptual prox-method" and a novel reduced-variance gradient estimator based on "sampling from the difference" between the current iterate and a reference point.

Kevin Tian | Yair Carmon | Aaron Sidford | Yujia Jin | Aaron Sidford | Y. Carmon | Kevin Tian | Yujia Jin

[1] Yin Tat Lee,et al. Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[3] Tatjana Chavdarova,et al. Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[4] Leonid Khachiyan,et al. A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..

[5] Yin Tat Lee,et al. Solving linear programs in the current matrix multiplication time , 2018, STOC.

[6] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[7] David P. Woodruff,et al. Sublinear Optimization for Machine Learning , 2010, FOCS.

[8] Tony Jebara,et al. Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[9] Jonah Sherman,et al. Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[10] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[11] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[12] R. Vershynin,et al. A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[13] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[15] Zeyuan Allen Zhu,et al. Optimization Algorithms for Faster Computational Geometry , 2014, ICALP.

[16] Marc Teboulle,et al. A simple algorithm for a class of nonsmooth convex-concave saddle-point problems , 2015, Oper. Res. Lett..

[17] Jonathan Eckstein,et al. Nonlinear Proximal Point Algorithms Using Bregman Functions, with Applications to Convex Programming , 1993, Math. Oper. Res..

[18] O. SIAMJ.,et al. PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[19] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[20] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[21] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[22] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[23] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[24] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[25] Chuan-Sheng Foo,et al. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[26] Renato D. C. Monteiro,et al. An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex–concave saddle-point problems , 2017, Optim. Methods Softw..

[27] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[28] George B. Dantzig,et al. Linear programming and extensions , 1965 .

[29] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[30] Peter Richtárik,et al. Revisiting Stochastic Extragradient , 2019, AISTATS.

[31] Michael D. Vose,et al. A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[32] Yurii Nesterov,et al. Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[33] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[34] Quanquan Gu,et al. Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[35] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[36] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[37] Sham M. Kakade,et al. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[38] Kevin Tian,et al. Coordinate Methods for Accelerating ℓ∞ Regression and Faster Approximate Maximum Flow , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[39] Aryan Mokhtari,et al. A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[40] Francis R. Bach,et al. Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[41] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[42] Michael I. Jordan,et al. Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[43] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[44] Yaoliang Yu,et al. Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction , 2017, NIPS.

[45] J. Neumann. Zur Theorie der Gesellschaftsspiele , 1928 .

[46] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[47] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.