Incremental proximal methods for large scale convex optimization

We consider the minimization of a sum $${\sum_{i=1}^mf_i(x)}$$ consisting of a large number of convex component functions fi. For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in a few contexts, including signal processing and inference/machine learning.

[1]  Boris Polyak,et al.  The method of projections for finding the common point of convex sets , 1967 .

[2]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[3]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[4]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[5]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[6]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[7]  B. T. Poljak Nonlinear programming methods in the presence of noise , 1978, Math. Program..

[8]  Michel Installe,et al.  Stochastic approximation methods , 1978 .

[9]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[10]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[11]  John N. Tsitsiklis,et al.  Distributed asynchronous deterministic and stochastic gradient optimization algorithms , 1986 .

[12]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[13]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[14]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[15]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[16]  D. Bertsekas,et al.  A hybrid incremental gradient method for least squares problems , 1994 .

[17]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[18]  Dimitri P. Bertsekas,et al.  Incremental Least Squares Methods and the Extended Kalman Filter , 1996, SIAM J. Optim..

[19]  George Ch. Pflug,et al.  Optimization of Stochastic Models , 1996 .

[20]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[21]  D. Bertsekas Gradient convergence in gradient methods , 1997 .

[22]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[23]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[24]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[25]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[26]  M. Solodov,et al.  Error Stability Properties of Generalized Gradient-Type Algorithms , 1998 .

[27]  Mikhail V. Solodov,et al.  Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[28]  Luigi Grippo,et al.  Convergent on-line algorithms for supervised learning in neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[29]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[30]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[31]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[32]  S. Uryasev,et al.  Stochastic optimization : Algorithms and Applications , 2001 .

[33]  Heinz H. Bauschke Projection Algorithms: Results and Open Problems , 2001 .

[34]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[35]  Robert D. Nowak,et al.  An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[36]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[37]  Heinz H. Bauschke,et al.  Hybrid projection-reflection method for phase retrieval. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[38]  Krzysztof C. Kiwiel,et al.  Convergence of Approximate and Incremental Subgradient Methods for Convex Optimization , 2003, SIAM J. Optim..

[39]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[40]  Heinz H. Bauschke,et al.  Extrapolation algorithm for affine-convex feasibility problems , 2005, Numerical Algorithms.

[41]  Robert D. Nowak,et al.  Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.

[42]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[43]  A. Banerjee Convex Analysis and Optimization , 2006 .

[44]  Giovanna Miglionico,et al.  An Incremental Method for Solving Convex Finite Min-Max Problems , 2006, Math. Oper. Res..

[45]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[46]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .

[47]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[48]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[49]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[50]  Michael Elad,et al.  Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization , 2007 .

[51]  Michael Unser,et al.  A fast iterative thresholding algorithm for wavelet-regularized deconvolution , 2007, SPIE Optical Engineering + Applications.

[52]  Andrzej Cegielski,et al.  Relaxed Alternating Projection Methods , 2008, SIAM J. Optim..

[53]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[54]  Mikael Johansson,et al.  A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[55]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[56]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[57]  Alvaro R. De Pierro,et al.  Incremental Subgradients for Constrained Convex Optimization: A Unified Framework and New Methods , 2009, SIAM J. Optim..

[58]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[59]  H. Vincent Poor,et al.  A Collaborative Training Algorithm for Distributed Learning , 2009, IEEE Transactions on Information Theory.

[60]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[61]  Dimitri P. Bertsekas,et al.  The effect of deterministic noise in subgradient methods , 2010, Math. Program..

[62]  Stephen J. Wright,et al.  Sparse Nonlinear Support Vector Machines via Stochastic Approximation , 2010 .

[63]  Marc Teboulle,et al.  Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.

[64]  Angelia Nedic,et al.  Random projection algorithms for convex set intersection problems , 2010, 49th IEEE Conference on Decision and Control (CDC).

[65]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[66]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[67]  Dimitri P. Bertsekas,et al.  A Unifying Polyhedral Approximation Framework for Convex Optimization , 2011, SIAM J. Optim..

[68]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[69]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .