Randomized First-Order Methods for Saddle Point Optimization

In this paper, we present novel randomized algorithms for solving saddle point problems whose dual feasible region is given by the direct product of many convex sets. Our algorithms can achieve anO(1=N) andO(1=N 2 ) rate of convergence, respectively, for general bilinear saddle point and smooth bilinear saddle point problems based on a new prima-dual termination criterion, and each iteration of these algorithms needs to solve only one randomly selected dual subproblem. Moreover, these algorithms do not require strongly convex assumptions on the objective function and/or the incorporation of a strongly convex perturbation term. They do not necessarily require the primal or dual feasible regions to be bounded or the estimation of the distance from the initial point to the set of optimal solutions to be available either. We show that when applied to linearly constrained problems, RPDs are equivalent to certain randomized variants of the alternating direction method of multipliers (ADMM), while a direct extension of ADMM does not necessarily converge when the number of blocks exceeds two.

[1]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[2]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[3]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[4]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[5]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[6]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[7]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[8]  Numérisation de documents anciens mathématiques,et al.  Mathematical modelling and numerical analysis : Modélisation mathématique et analyse numérique. , 1985 .

[9]  R. Fildes Journal of the American Statistical Association : William S. Cleveland, Marylyn E. McGill and Robert McGill, The shape parameter for a two variable graph 83 (1988) 289-300 , 1989 .

[10]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[11]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[12]  A. Iusem,et al.  Enlargement of Monotone Operators with Applications to Variational Inequalities , 1997 .

[13]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[16]  Mingqiang Zhu,et al.  An Efficient Primal-Dual Hybrid Gradient Algorithm For Total Variation Image Restoration , 2008 .

[17]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[18]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Adrian S. Lewis,et al.  Randomized Methods for Linear Constraints: Convergence Rates and Conditioning , 2008, Math. Oper. Res..

[21]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[22]  Tony F. Chan,et al.  A General Framework for a Class of First Order Primal-Dual Algorithms for Convex Optimization in Imaging Science , 2010, SIAM J. Imaging Sci..

[23]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[24]  Renato D. C. Monteiro,et al.  Complexity of Variants of Tseng's Modified F-B Splitting and Korpelevich's Methods for Hemivariational Inequalities with Applications to Saddle-point and Convex Optimization Problems , 2011, SIAM J. Optim..

[25]  Guanghui Lan,et al.  Primal-dual first-order methods with O (1/e) iteration-complexity for cone programming. , 2011 .

[26]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[27]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[28]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[29]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[30]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[31]  Shiqian Ma,et al.  Fast Multiple-Splitting Algorithms for Convex Optimization , 2009, SIAM J. Optim..

[32]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[33]  Rong Jin,et al.  MixedGrad: An O(1/T) Convergence Rate Algorithm for Stochastic Smooth Optimization , 2013, ArXiv.

[34]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[35]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[36]  Taiji Suzuki Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method , 2013, 1311.0622.

[37]  Renato D. C. Monteiro,et al.  Iteration-Complexity of Block-Decomposition Algorithms and the Alternating Direction Method of Multipliers , 2013, SIAM J. Optim..

[38]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order penalty methods for convex programming , 2013, Math. Program..

[39]  Suzuki Taiji,et al.  Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method , 2013 .

[40]  Shiqian Ma,et al.  Fast alternating linearization methods for minimizing the sum of two convex functions , 2009, Math. Program..

[41]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[42]  Tianyi Lin,et al.  On the Convergence Rate of Multi-Block ADMM , 2014 .

[43]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[44]  Yurii Nesterov,et al.  Subgradient methods for huge-scale optimization problems , 2013, Mathematical Programming.

[45]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[46]  Yunmei Chen,et al.  An Accelerated Linearized Alternating Direction Method of Multipliers , 2014, SIAM J. Imaging Sci..

[47]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[48]  Shiqian Ma,et al.  On the Global Linear Convergence of the ADMM with MultiBlock Variables , 2014, SIAM J. Optim..

[49]  Niao He,et al.  Mirror Prox algorithm for multi-term composite minimization and semi-separable problems , 2013, Computational Optimization and Applications.

[50]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[51]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order augmented Lagrangian methods for convex programming , 2015, Mathematical Programming.

[52]  Antonin Chambolle,et al.  On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[53]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[54]  Bingsheng He,et al.  The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent , 2014, Mathematical Programming.