Iterative regularization for convex regularizers

We study iterative regularization for linear models, when the bias is convex but not necessarily strongly convex. We characterize the stability properties of a primal-dual gradient based approach, analyzing its convergence in the presence of worst case deterministic noise. As a main example, we specialize and illustrate the results for the problem of robust sparse recovery. Key to our analysis is a combination of ideas from regularization theory and optimization in the presence of errors. Theoretical results are complemented by experiments showing that state-of-the-art performances can be achieved with considerable computational speed-ups.

[1]  Xavier Bresson,et al.  Bregmanized Nonlocal Regularization for Deconvolution and Sparse Reconstruction , 2010, SIAM J. Imaging Sci..

[2]  Wotao Yin,et al.  An Iterative Regularization Method for Total Variation-Based Image Restoration , 2005, Multiscale Model. Simul..

[3]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[4]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[5]  J. Peypouquet Convex Optimization in Normed Spaces , 2015 .

[6]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[7]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[8]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[9]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Julian Rasch,et al.  Inexact first-order primal–dual algorithms , 2018, Computational Optimization and Applications.

[12]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[14]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[15]  Frank Schöpfer Exact Regularization of Polyhedral Norms , 2012, SIAM J. Optim..

[16]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[17]  Stanley Osher,et al.  A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[18]  O. Scherzer,et al.  Necessary and sufficient conditions for linear convergence of ℓ1‐regularization , 2011 .

[19]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[20]  Lorenzo Rosasco,et al.  Implicit Regularization of Accelerated Methods in Hilbert Spaces , 2019, NeurIPS.

[21]  Lorenzo Rosasco,et al.  Accelerated Iterative Regularization via Dual Diagonal Descent , 2019, SIAM J. Optim..

[22]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[23]  Dirk A. Lorenz,et al.  Linear convergence of the randomized sparse Kaczmarz method , 2016, Mathematical Programming.

[24]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[26]  Lorenzo Rosasco,et al.  Learning with Incremental Iterative Regularization , 2014, NIPS.

[27]  Lin He,et al.  Error estimation for Bregman iterations and inverse scale space methods in image restoration , 2007, Computing.

[28]  Lorenzo Rosasco,et al.  Iterative Regularization via Dual Diagonal Descent , 2016, Journal of Mathematical Imaging and Vision.

[29]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[30]  Barbara Kaltenbacher,et al.  Iterative Regularization Methods for Nonlinear Ill-Posed Problems , 2008, Radon Series on Computational and Applied Mathematics.

[31]  Nathan Srebro,et al.  Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.

[32]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[33]  Nathan Srebro,et al.  The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..

[34]  Varun Kanade,et al.  The Statistical Complexity of Early Stopped Mirror Descent , 2020, NeurIPS.

[35]  Martin Benning,et al.  Gradient descent in a generalised Bregman distance framework , 2016 .

[36]  Michael W. Mahoney Approximate computation and implicit regularization for very large-scale data analysis , 2012, PODS.

[37]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[38]  Paul Tseng,et al.  Exact Regularization of Convex Programs , 2007, SIAM J. Optim..

[39]  Pascal Bianchi,et al.  A Coordinate-Descent Primal-Dual Algorithm with Large Step Size and Possibly Nonseparable Functions , 2015, SIAM J. Optim..

[40]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[41]  S. Osher,et al.  Sparse Recovery via Differential Inclusions , 2014, 1406.7728.

[42]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[43]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[44]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[45]  Varun Kanade,et al.  Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.

[46]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[47]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[48]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[49]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[50]  Martin Burger,et al.  Iterative total variation schemes for nonlinear inverse problems , 2009 .

[51]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[52]  B. Lemaire,et al.  Convergence of diagonally stationary sequences in convex optimization , 1994 .

[53]  Lorenzo Rosasco,et al.  Don't relax: early stopping for convex regularization , 2017, ArXiv.

[54]  Martin J. Wainwright,et al.  Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions , 2012, 2014 48th Annual Conference on Information Sciences and Systems (CISS).