A Generic Acceleration Framework for Stochastic Composite Optimization

In this paper, we introduce various mechanisms to obtain accelerated first-order stochastic optimization algorithms when the objective function is convex or strongly convex. Specifically, we extend the Catalyst approach originally designed for deterministic objectives to the stochastic setting. Given an optimization method with mild convergence guarantees for strongly convex problems, the challenge is to accelerate convergence to a noise-dominated region, and then achieve convergence with an optimal worst-case complexity depending on the noise variance of the gradients. A side contribution of our work is also a generic analysis that can handle inexact proximal operators, providing new insights about the robustness of stochastic algorithms when the proximal operator cannot be exactly computed.

[1]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[3]  Kaiwen Zhou,et al.  Direct Acceleration of SAGA using Sampled Negative Momentum , 2018, AISTATS.

[4]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[5]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[6]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[7]  John C. Duchi,et al.  Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity , 2018, SIAM J. Optim..

[8]  Z. Harchaoui,et al.  Catalyst Acceleration for Gradient-Based Non-Convex Optimization , 2017, 1703.10993.

[9]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[10]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[11]  Julien Mairal,et al.  An Inexact Variable Metric Proximal Point Algorithm for Generic Quasi-Newton Acceleration , 2016, SIAM J. Optim..

[12]  Edoardo M. Airoldi,et al.  Towards Stability and Optimality in Stochastic Gradient Descent , 2015, AISTATS.

[13]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[14]  A. Chambolle,et al.  A remark on accelerated block coordinate descent for computing the proximity operators of a sum of convex functions , 2015 .

[15]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[16]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[17]  Asuman E. Ozdaglar,et al.  A Universally Optimal Multistage Accelerated Stochastic Gradient Method , 2019, NeurIPS.

[18]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[19]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[20]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[21]  Peter L. Bartlett,et al.  Implicit Online Learning , 2010, ICML.

[22]  Fanhua Shang,et al.  A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.

[23]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[26]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[27]  James T. Kwok,et al.  Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data , 2018, ICML.

[28]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[29]  Peter Richtárik,et al.  Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop , 2019, ALT.

[30]  Peter Richtárik,et al.  SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.

[31]  Michael Cohen,et al.  On Acceleration with Noise-Corrupted Gradients , 2018, ICML.

[32]  F. Bach,et al.  Stochastic quasi-gradient methods: variance reduction via Jacobian sketching , 2018, Mathematical Programming.

[33]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[34]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[35]  Alexandre d'Aspremont,et al.  Nonlinear Acceleration of Stochastic Algorithms , 2017, NIPS.

[36]  Aurélien Lucchi,et al.  Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.

[37]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[38]  Ohad Shamir,et al.  Dimension-Free Iteration Complexity of Finite Sum Optimization Problems , 2016, NIPS.

[39]  Sida I. Wang,et al.  Altitude Training: Strong Bounds for Single-Layer Dropout , 2014, NIPS.

[40]  Zaïd Harchaoui,et al.  Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..

[41]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[42]  Julien Mairal,et al.  Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[43]  Julien Mairal,et al.  Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise , 2019, J. Mach. Learn. Res..

[44]  Dimitri P. Bertsekas,et al.  Incremental proximal methods for large scale convex optimization , 2011, Math. Program..

[45]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[46]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[47]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[48]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[49]  E. Airoldi,et al.  Stable Robbins-Monro approximations through stochastic proximal updates , 2015 .

[50]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[51]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[52]  James T. Kwok,et al.  Accelerated Gradient Methods for Stochastic Optimization and Online Learning , 2009, NIPS.

[53]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[54]  O. Devolder,et al.  Stochastic first order methods in smooth convex optimization , 2011 .

[55]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[56]  Yi Zhou,et al.  Random gradient extrapolation for distributed and stochastic optimization , 2017, SIAM J. Optim..

[57]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..