Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

We survey incremental methods for minimizing a sum P m=1 fi(x) consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fi. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and large-scale and distributed optimization.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[3]  V. Fabian STOCHASTIC APPROXIMATION METHODS , 1960 .

[4]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[5]  Boris Polyak,et al.  The method of projections for finding the common point of convex sets , 1967 .

[6]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[7]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[8]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[9]  W. Davidon New least-square algorithms , 1976 .

[10]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[11]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[12]  Yuri Ermoliev,et al.  Stochastic Programming Methods , 1976 .

[13]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[14]  Harold J. Kushner,et al.  wchastic. approximation methods for constrained and unconstrained systems , 1978 .

[15]  B. T. Poljak Nonlinear programming methods in the presence of noise , 1978, Math. Program..

[16]  Michel Installe,et al.  Stochastic approximation methods , 1978 .

[17]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[18]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[19]  M. Fortin,et al.  Augmented Lagrangian methods : applications to the numerical solution of boundary-value problems , 1983 .

[20]  D. Gabay Applications of the method of multipliers to variational inequalities , 1983 .

[21]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[22]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[23]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[24]  Jonathan E. Spingarn,et al.  Applications of the method of partial inverses to convex programming: Decomposition , 1985, Math. Program..

[25]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[26]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[27]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[28]  Luo Zhi-quan,et al.  Analysis of an approximate gradient projection method with applications to the backpropagation algorithm , 1994 .

[29]  Luigi Grippo,et al.  A class of unconstrained minimization methods for neural network training , 1994 .

[30]  D. Bertsekas,et al.  A hybrid incremental gradient method for least squares problems , 1994 .

[31]  O. Mangasarian,et al.  Serial and parallel backpropagation convergence via nonmonotone perturbed minimization , 1994 .

[32]  Dimitri P. Bertsekas,et al.  Incremental Least Squares Methods and the Extended Kalman Filter , 1996, SIAM J. Optim..

[33]  George Ch. Pflug,et al.  Optimization of Stochastic Models , 1996 .

[34]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[35]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[36]  D. Bertsekas Gradient convergence in gradient methods , 1997 .

[37]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[38]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[39]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[40]  V. G. Bolti︠a︡nskiĭ,et al.  Geometric Methods and Optimization Problems , 1998 .

[41]  A Orman,et al.  Optimization of Stochastic Models: The Interface Between Simulation and Optimization , 2012, J. Oper. Res. Soc..

[42]  M. Solodov,et al.  Error Stability Properties of Generalized Gradient-Type Algorithms , 1998 .

[43]  Mikhail V. Solodov,et al.  Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[44]  Luigi Grippo,et al.  Convergent on-line algorithms for supervised learning in neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[45]  Markov Sampling , 2000 .

[46]  Vivek S. Borkar,et al.  Distributed Asynchronous Incremental Subgradient Methods , 2001 .

[47]  Dimitri P. Bertsekas,et al.  Incremental Subgradient Methods for Nondifferentiable Optimization , 2001, SIAM J. Optim..

[48]  Heinz H. Bauschke Projection Algorithms: Results and Open Problems , 2001 .

[49]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[50]  Zvi Drezner,et al.  Facility location - applications and theory , 2001 .

[51]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[52]  Masao Fukushima,et al.  The Incremental Gauss-Newton Algorithm with Adaptive Stepsize Rule , 2003, Comput. Optim. Appl..

[53]  Robert D. Nowak,et al.  An EM algorithm for wavelet-based image restoration , 2003, IEEE Trans. Image Process..

[54]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[55]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[56]  Heinz H. Bauschke,et al.  Hybrid projection-reflection method for phase retrieval. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[57]  Krzysztof C. Kiwiel,et al.  Convergence of Approximate and Incremental Subgradient Methods for Convex Optimization , 2003, SIAM J. Optim..

[58]  Robert Nowak,et al.  Distributed optimization in sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[59]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[60]  Heinz H. Bauschke,et al.  Extrapolation algorithm for affine-convex feasibility problems , 2005, Numerical Algorithms.

[61]  Robert D. Nowak,et al.  Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.

[62]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[63]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[64]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[65]  A. Banerjee Convex Analysis and Optimization , 2006 .

[66]  Giovanna Miglionico,et al.  An Incremental Method for Solving Convex Finite Min-Max Problems , 2006, Math. Oper. Res..

[67]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[68]  Sean P. Meyn Control Techniques for Complex Networks: Workload , 2007 .

[69]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[70]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[71]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[72]  Michael Elad,et al.  Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization , 2007 .

[73]  Michael Unser,et al.  A fast iterative thresholding algorithm for wavelet-regularized deconvolution , 2007, SPIE Optical Engineering + Applications.

[74]  Andrzej Cegielski,et al.  Relaxed Alternating Projection Methods , 2008, SIAM J. Optim..

[75]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[76]  Mikael Johansson,et al.  A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..

[77]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[78]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[79]  Alvaro R. De Pierro,et al.  Incremental Subgradients for Constrained Convex Optimization: A Unified Framework and New Methods , 2009, SIAM J. Optim..

[80]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[81]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[82]  H. Vincent Poor,et al.  A Collaborative Training Algorithm for Distributed Learning , 2009, IEEE Transactions on Information Theory.

[83]  Angelia Nedic,et al.  Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[84]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[85]  Dimitri P. Bertsekas,et al.  The effect of deterministic noise in subgradient methods , 2010, Math. Program..

[86]  Stephen J. Wright,et al.  Sparse Nonlinear Support Vector Machines via Stochastic Approximation , 2010 .

[87]  Marc Teboulle,et al.  Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.

[88]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[89]  Angelia Nedic,et al.  Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..

[90]  Dimitri P. Bertsekas,et al.  A Unifying Polyhedral Approximation Framework for Convex Optimization , 2011, SIAM J. Optim..

[91]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[92]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[93]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[94]  Shiqian Ma,et al.  Fast Multiple-Splitting Algorithms for Convex Optimization , 2009, SIAM J. Optim..

[95]  Renato D. C. Monteiro,et al.  Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression , 2009, Math. Program..

[96]  Shiqian Ma,et al.  Fast alternating linearization methods for minimizing the sum of two convex functions , 2009, Math. Program..

[97]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .