Coordinate Methods for Matrix Games

We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $\min\nolimits_{x\in \mathcal{X}}\max\nolimits_{y\in \mathcal{Y}}y^{\top}Ax$ which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries. Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the $m$ by $n$ matrix $A$. For example, when rows and columns have constant $\ell_{1}/\ell_{2}$ norm ratios, we offer improvements by a factor of $m+n$ in the fully stochastic setting and $\sqrt{m+n}$ in the variance-reduced setting. We apply our methods to computational geometry problems, i.e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\text{nnz}(A)/(m+n)}$.

[1]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[2]  Yin Tat Lee,et al.  Efficient Inverse Maintenance and Faster Algorithms for Linear Programming , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[3]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[4]  Yin Tat Lee,et al.  Solving Empirical Risk Minimization in the Current Matrix Multiplication Time , 2019, COLT.

[5]  Seth Lloyd,et al.  Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension , 2018, ArXiv.

[6]  Yin Tat Lee,et al.  An improved cutting plane method for convex optimization, convex-concave games, and its applications , 2020, STOC.

[7]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[8]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[9]  Mengdi Wang,et al.  Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time , 2017, 1704.01869.

[10]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[11]  Nisheeth K. Vishnoi,et al.  Faster Algorithms via Approximation Theory , 2014, Found. Trends Theor. Comput. Sci..

[12]  Gary L. Miller,et al.  Approaching Optimality for Solving SDD Linear Systems , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[14]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[15]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[16]  Volkan Cevher,et al.  Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage , 2017, AISTATS.

[17]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[18]  Hilbert J. Kappen,et al.  On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.

[19]  Yonatan Wexler,et al.  Minimizing the Maximal Loss: How and Why , 2016, ICML.

[20]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[21]  Leonid Khachiyan,et al.  A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..

[22]  J. P. Warners,et al.  The Use of Low-rank Updates in Interior-point Methods , 2022 .

[23]  Jan van den Brand A Deterministic Linear Program Solver in Current Matrix Multiplication Time , 2020, SODA.

[24]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[25]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[26]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[27]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[28]  Yin Tat Lee,et al.  A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[29]  Mengdi Wang,et al.  Primal-Dual π Learning: Sample Complexity and Sublinear Run Time for Ergodic Markov Decision Problems , 2017, ArXiv.

[30]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[31]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[32]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[33]  Yin Tat Lee,et al.  Solving tall dense linear programs in nearly linear time , 2020, STOC.

[34]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[35]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[36]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[37]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[38]  Xian Wu,et al.  Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.

[39]  David P. Woodruff,et al.  Sublinear Optimization for Machine Learning , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[40]  Kevin Tian,et al.  Coordinate Methods for Accelerating ℓ∞ Regression and Faster Approximate Maximum Flow , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[41]  Peter Richtárik,et al.  On optimal probabilities in stochastic coordinate descent methods , 2013, Optim. Lett..

[42]  David P. Woodruff,et al.  Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[43]  Yin Tat Lee,et al.  Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver , 2015, SODA.

[44]  Dmitrii Ostrovskii,et al.  Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification , 2019, ArXiv.

[45]  Aaron Sidford,et al.  Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression , 2018, NeurIPS.

[46]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[47]  Shiqian Ma,et al.  Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity , 2018, NeurIPS.

[48]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[49]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[50]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[51]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[52]  Zeyuan Allen Zhu,et al.  Optimization Algorithms for Faster Computational Geometry , 2014, ICALP.