Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization

We consider the problem of optimizing the sum of a smooth convex function and a non-smooth convex function using proximal-gradient methods, where an error is present in the calculation of the gradient of the smooth term or in the proximity operator with respect to the non-smooth term. We show that both the basic proximal-gradient method and the accelerated proximal-gradient method achieve the same convergence rate as in the error-free case, provided that the errors decrease at appropriate rates. Using these rates, we perform as well as or better than a carefully chosen fixed error level on a set of structured sparsity problems.

[1]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[2]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[3]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[6]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[7]  Martin J. Wainwright,et al.  Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching , 2003, AISTATS.

[8]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[9]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[10]  P. L. Combettes,et al.  Solving monotone inclusions via compositions of nonexpansive averaged operators , 2004 .

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[13]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[14]  P. L. Combettes,et al.  A Dykstra-like algorithm for two monotone operators , 2007 .

[15]  Alexandre d'Aspremont,et al.  Subsampling algorithms for semidefinite programming , 2008, 0803.1990.

[16]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[17]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[18]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[19]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[20]  Mohamed-Jalal Fadili,et al.  Total Variation Projection With First Order Schemes , 2011, IEEE Transactions on Image Processing.

[21]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[22]  M. Baes Estimate sequence methods: extensions and approximations , 2009 .

[23]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[24]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[25]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[26]  Mark W. Schmidt,et al.  Convex Structure Learning in Log-Linear Models: Beyond Pairwise Potentials , 2010, AISTATS.

[27]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[28]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[29]  Jieping Ye,et al.  Fast Overlapping Group Lasso , 2010, ArXiv.

[30]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[31]  Shiqian Ma,et al.  Fixed point and Bregman iterative methods for matrix rank minimization , 2009, Math. Program..

[32]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[33]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[34]  Julien Mairal,et al.  Proximal Methods for Hierarchical Sparse Coding , 2010, J. Mach. Learn. Res..

[35]  Mark W. Schmidt,et al.  Projected Newton-type methods in machine learning , 2011 .

[36]  Julien Mairal,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[37]  Kim-Chuan Toh,et al.  An Inexact Accelerated Proximal Gradient Method for Large Scale Linearly Constrained Convex SDP , 2012, SIAM J. Optim..

[38]  Mark W. Schmidt,et al.  Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..

[39]  Luca Baldassarre,et al.  Accelerated and Inexact Forward-Backward Algorithms , 2013, SIAM J. Optim..