Coordinate Linear Variance Reduction for Generalized Linear Programming

We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coordinate Linear Variance Reduction} (\textsc{clvr}; pronounced"clever"). \textsc{clvr} yields improved complexity results for (GLP) that depend on the max row norm of the linear constraint matrix in (GLP) rather than the spectral norm. When the regularization terms and constraints are separable, \textsc{clvr} admits an efficient lazy update strategy that makes its complexity bounds scale with the number of nonzero elements of the linear constraint matrix in (GLP) rather than the matrix dimensions. On the other hand, for the special case of linear programs, by exploiting sharpness, we propose a restart scheme for \textsc{clvr} to obtain empirical linear convergence. Then we show that Distributionally Robust Optimization (DRO) problems with ambiguity sets based on both $f$-divergence and Wasserstein metrics can be reformulated as (GLPs) by introducing sparsely connected auxiliary variables. We complement our theoretical guarantees with numerical experiments that verify our algorithm's practical effectiveness, in terms of wall-clock time and number of data passes.

[1]  Volkan Cevher,et al.  On the Complexity of a Practical Primal-Dual Coordinate Method , 2022, ArXiv.

[2]  Haihao Lu,et al.  Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming , 2021, 2111.05530.

[3]  Haihao Lu,et al.  Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient , 2021, NeurIPS.

[4]  Oliver Hinder,et al.  Faster first-order primal-dual methods for linear programming using restarts and sharpness , 2021, Mathematical Programming.

[5]  Michael I. Jordan,et al.  Fast Distributionally Robust Learning with Variance Reduced Min-Max Optimization , 2021, AISTATS.

[6]  Stephen J. Wright,et al.  Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums , 2021, ICML.

[7]  Yura Malitsky,et al.  Stochastic Variance Reduction for Variational Inequality Methods , 2021, COLT.

[8]  Erfan Yazdandoost Hamedani,et al.  A stochastic variance-reduced accelerated primal-dual method for finite-sum saddle-point problems , 2020, Computational Optimization and Applications.

[9]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[10]  Kevin Tian,et al.  Coordinate Methods for Matrix Games , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[11]  Volkan Cevher,et al.  Random extrapolation for primal-dual coordinate descent , 2020, ICML.

[12]  Lei Zhao,et al.  Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained Convex Programming , 2020, ICML.

[13]  Chaobing Song,et al.  Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization , 2020, NeurIPS.

[14]  Stephen J. Wright,et al.  Adversarial classification via distributional robustness with Wasserstein ambiguity , 2020, Mathematical Programming.

[15]  V. Cevher,et al.  On the Convergence of Stochastic Primal-Dual Hybrid Gradient , 2019, SIAM J. Optim..

[16]  Anthony Man-Cho So,et al.  A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression , 2019, ArXiv.

[17]  Sanjay Mehrotra,et al.  Distributionally Robust Optimization: A Review , 2019, ArXiv.

[18]  Kevin Tian,et al.  Variance Reduction for Matrix Games , 2019, NeurIPS.

[19]  Mykel J. Kochenderfer,et al.  Algorithms for Verifying Deep Neural Networks , 2019, Found. Trends Optim..

[20]  John C. Duchi,et al.  Learning Models with Uniform Performance via Distributionally Robust Optimization , 2018, ArXiv.

[21]  Yangyang Xu,et al.  Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[22]  Yangyang Xu,et al.  Primal-Dual Stochastic Gradient Method for Convex Programs with Many Functional Constraints , 2018, SIAM J. Optim..

[23]  Daoli Zhu,et al.  First-Order Primal-Dual Method for Nonlinear Convex Cone Programs , 2018 .

[24]  Volkan Cevher,et al.  Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization , 2017, NIPS.

[25]  Shuzhong Zhang,et al.  First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints , 2017 .

[26]  Ion Necoara,et al.  Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization , 2017, J. Mach. Learn. Res..

[27]  Antonin Chambolle,et al.  Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..

[28]  Panagiotis Patrinos,et al.  A New Randomized Block-Coordinate Primal-Dual Proximal Algorithm for Distributed Optimization , 2017, IEEE Transactions on Automatic Control.

[29]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[30]  John Duchi,et al.  Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach , 2016, Math. Oper. Res..

[31]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[32]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[33]  Pascal Bianchi,et al.  A Coordinate-Descent Primal-Dual Algorithm with Large Step Size and Possibly Nonseparable Functions , 2015, SIAM J. Optim..

[34]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[35]  Pascal Bianchi,et al.  Ergodic Convergence of a Stochastic Proximal Point Algorithm , 2015, SIAM J. Optim..

[36]  Daniel Kuhn,et al.  Distributionally Robust Convex Optimization , 2014, Oper. Res..

[37]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[38]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[39]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[40]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[41]  Stephen J. Wright,et al.  Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning , 2012, J. Mach. Learn. Res..

[42]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[45]  C. Villani Optimal Transport: Old and New , 2008 .

[46]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[47]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[48]  Stephen P. Boyd,et al.  Convex Optimization , 2004, IEEE Transactions on Automatic Control.

[49]  O. Mangasarian A Newton Method for Linear Programming , 2004 .

[50]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..

[51]  O. Mangasarian,et al.  NONLINEAR PERTURBATION OF LINEAR PROGRAMS , 1979 .

[52]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[53]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[54]  Josef Stoer,et al.  Duality in nonlinear programming and the minimax theorem , 1963 .

[55]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[56]  A. Hoffman On approximate solutions of systems of linear inequalities , 1952 .

[57]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[58]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[59]  A. Shapiro ON DUALITY THEORY OF CONIC LINEAR PROBLEMS , 2001 .

[60]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[61]  O. Mangasarian Normal solutions of linear programs , 1984 .

[62]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .