Variance Reduction for Matrix Games

We present a randomized primal-dual algorithm that solves the problem $\min_{x} \max_{y} y^\top A x$ to additive error $\epsilon$ in time $\mathrm{nnz}(A) + \sqrt{\mathrm{nnz}(A)n}/\epsilon$, for matrix $A$ with larger dimension $n$ and $\mathrm{nnz}(A)$ nonzero entries. This improves the best known exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/n}$ and is faster than fully stochastic gradient methods in the accurate and/or sparse regime $\epsilon \le \sqrt{n/\mathrm{nnz}(A)}$. Our results hold for $x,y$ in the simplex (matrix games, linear programming) and for $x$ in an $\ell_2$ ball and $y$ in the simplex (perceptron / SVM, minimum enclosing ball). Our algorithm combines Nemirovski's "conceptual prox-method" and a novel reduced-variance gradient estimator based on "sampling from the difference" between the current iterate and a reference point.

[1]  Yin Tat Lee,et al.  Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[2]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[3]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[4]  Leonid Khachiyan,et al.  A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..

[5]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[6]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[7]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, FOCS.

[8]  Tony Jebara,et al.  Frank-Wolfe Algorithms for Saddle Point Problems , 2016, AISTATS.

[9]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[10]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[11]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[12]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[13]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[14]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[15]  Zeyuan Allen Zhu,et al.  Optimization Algorithms for Faster Computational Geometry , 2014, ICALP.

[16]  Marc Teboulle,et al.  A simple algorithm for a class of nonsmooth convex-concave saddle-point problems , 2015, Oper. Res. Lett..

[17]  Jonathan Eckstein,et al.  Nonlinear Proximal Point Algorithms Using Bregman Functions, with Applications to Convex Programming , 1993, Math. Oper. Res..

[18]  O. SIAMJ.,et al.  PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[19]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[20]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[21]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[22]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[25]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[26]  Renato D. C. Monteiro,et al.  An accelerated non-Euclidean hybrid proximal extragradient-type algorithm for convex–concave saddle-point problems , 2017, Optim. Methods Softw..

[27]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[28]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[29]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[30]  Peter Richtárik,et al.  Revisiting Stochastic Extragradient , 2019, AISTATS.

[31]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[32]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[33]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[34]  Quanquan Gu,et al.  Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization , 2018, NeurIPS.

[35]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[36]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[37]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[38]  Kevin Tian,et al.  Coordinate Methods for Accelerating ℓ∞ Regression and Faster Approximate Maximum Flow , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[39]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[40]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[41]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[42]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[43]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[44]  Yaoliang Yu,et al.  Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction , 2017, NIPS.

[45]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[46]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[47]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.