Incremental Methods for Weakly Convex Optimization

We consider incremental algorithms for solving \emph{weakly convex} optimization problems, a wide class of (possibly nondifferentiable) nonconvex optimization problems. We will analyze incremental (sub)-gradient descent, incremental proximal point algorithm and incremental prox-linear algorithm in this paper. We show that the convergence rate of the three incremental algorithms is ${\cal O}(k^{-{1}/{4}})$ under weakly convex setting. This extends the convergence theory of incremental methods from convex optimization to nondifferentiable nonconvex regime. When the weakly convex function satisfies an additional regularity condition called \emph{sharpness}, we show that all the three incremental algorithms with a geometrical diminishing stepsize and an appropriate initialization converge \emph{linearly} to the optimal solution set. We conduct experiments on robust matrix sensing and robust phase retrieval to illustrate the superior convergence property of the three incremental methods.

[1]  Luigi Grippo,et al.  Convergent on-line algorithms for supervised learning in neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  Feng Ruan,et al.  Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval , 2017, Information and Inference: A Journal of the IMA.

[3]  D. Bertsekas,et al.  Incremental subgradient methods for nondifferentiable optimization , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[4]  Dimitri P. Bertsekas,et al.  Feature-based aggregation and deep reinforcement learning: a survey and some new implementations , 2018, IEEE/CAA Journal of Automatica Sinica.

[5]  Dmitriy Drusvyatskiy,et al.  Subgradient Methods for Sharp Weakly Convex Functions , 2018, Journal of Optimization Theory and Applications.

[6]  Alfred O. Hero,et al.  A Convergent Incremental Gradient Method with a Constant Step Size , 2007, SIAM J. Optim..

[7]  Mikhail V. Solodov,et al.  Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero , 1998, Comput. Optim. Appl..

[8]  Asuman E. Ozdaglar,et al.  On the Convergence Rate of Incremental Aggregated Gradient Algorithms , 2015, SIAM J. Optim..

[9]  Paul Tseng,et al.  An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule , 1998, SIAM J. Optim..

[10]  H. Attouch,et al.  Approximation and regularization of arbitrary functions in Hilbert spaces by the Lasry-Lions method , 1993 .

[11]  Asuman E. Ozdaglar,et al.  Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[12]  Yonina C. Eldar,et al.  Phase Retrieval: Stability and Recovery Guarantees , 2012, ArXiv.

[13]  Dimitri P. Bertsekas,et al.  Incremental Aggregated Proximal and Augmented Lagrangian Algorithms , 2015, ArXiv.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[16]  Jean-Louis Goffin,et al.  On convergence rates of subgradient optimization methods , 1977, Math. Program..

[17]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[18]  Feng Ruan,et al.  Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[19]  R. Rockafellar,et al.  Prox-regular functions in variational analysis , 1996 .

[20]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[21]  Krzysztof C. Kiwiel,et al.  Convergence of Approximate and Incremental Subgradient Methods for Convex Optimization , 2003, SIAM J. Optim..

[22]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[23]  Suvrit Sra,et al.  Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[24]  Zhi-Quan Luo,et al.  On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.

[25]  Takeo Kanade,et al.  Robust L/sub 1/ norm factorization in the presence of outliers and missing data by alternative convex programming , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Aryan Mokhtari,et al.  Surpassing Gradient Descent Provably: A Cyclic Incremental Method with Linear Convergence Rate , 2016, SIAM J. Optim..

[27]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[28]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[29]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[30]  D. Bertsekas,et al.  Convergen e Rate of In remental Subgradient Algorithms , 2000 .

[31]  Qi Tian,et al.  Statistical modeling of complex backgrounds for foreground object detection , 2004, IEEE Transactions on Image Processing.

[32]  A. Ozdaglar,et al.  Convergence Rate of Incremental Gradient and Newton Methods , 2015 .

[33]  Dmitriy Drusvyatskiy,et al.  Composite optimization for robust blind deconvolution , 2019, ArXiv.

[34]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[35]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[36]  M. Ferris,et al.  Weak sharp minima in mathematical programming , 1993 .

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[39]  Michael C. Ferris,et al.  A Gauss—Newton method for convex composite optimization , 1995, Math. Program..

[40]  James V. Burke,et al.  Descent methods for composite nondifferentiable optimization problems , 1985, Math. Program..

[41]  Jean-Philippe Vial,et al.  Strong and Weak Convexity of Sets and Functions , 1983, Math. Oper. Res..

[42]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[43]  L. Bottou Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .

[44]  Xiao Li,et al.  Nonconvex Robust Low-rank Matrix Recovery , 2018, SIAM J. Optim..

[45]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .