On Acceleration with Noise-Corrupted Gradients

Accelerated algorithms have broad applications in large-scale optimization, due to their generality and fast convergence. However, their stability in the practical setting of noise-corrupted gradient oracles is not well-understood. This paper provides two main technical contributions: (i) a new accelerated method AGDP that generalizes Nesterov's AGD and improves on the recent method AXGD (Diakonikolas & Orecchia, 2018), and (ii) a theoretical study of accelerated algorithms under noisy and inexact gradient oracles, which is supported by numerical experiments. This study leverages the simplicity of AGDP and its analysis to clarify the interaction between noise and acceleration and to suggest modifications to the algorithm that reduce the mean and variance of the error incurred due to the gradient noise.

[1]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[3]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[4]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[5]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization I: A Generic Algorithmic Framework , 2012, SIAM J. Optim..

[6]  Howard Wang,et al.  Measurements-based power control - A cross-layered framework , 2013, 2013 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference (OFC/NFOEC).

[7]  Peter L. Bartlett,et al.  Acceleration and Averaging in Stochastic Descent Dynamics , 2017, NIPS.

[8]  Jonah Sherman,et al.  Area-convexity, l∞ regularization, and undirected multicommodity flow , 2017, STOC.

[9]  Zeyuan Allen Zhu,et al.  Nearly-Linear Time Positive LP Solver with Faster Convergence Rate , 2015, STOC.

[10]  Huy L. Nguyen,et al.  Constrained Submodular Maximization: Beyond 1/e , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[11]  Zeyuan Allen Zhu,et al.  A simple, combinatorial algorithm for solving SDD systems in nearly-linear time , 2013, STOC '13.

[12]  É. Moulines,et al.  On stochastic proximal gradient algorithms , 2014 .

[13]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[14]  Bin Hu,et al.  Control interpretations for first-order optimization methods , 2017, 2017 American Control Conference (ACC).

[15]  A. V. Gasnikov,et al.  Universal Method for Stochastic Composite Optimization Problems , 2018 .

[16]  Alexander Gasnikov,et al.  Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle , 2016, Journal of Optimization Theory and Applications.

[17]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[18]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[19]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[20]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[21]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[22]  Saeed Ghadimi,et al.  Optimal Stochastic Approximation Algorithms for Strongly Convex Stochastic Composite Optimization, II: Shrinking Procedures and Optimal Algorithms , 2013, SIAM J. Optim..

[23]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[24]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[25]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[26]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[27]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[28]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[29]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[30]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[31]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[32]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[33]  Prateek Jain,et al.  Accelerating Stochastic Gradient Descent , 2017, ArXiv.

[34]  Jelena Diakonikolas,et al.  Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[35]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[36]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[37]  P. Seiler,et al.  Analysis of biased stochastic gradient descent using sequential semidefinite programs , 2017, Math. Program..

[38]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[39]  Gersende Fort,et al.  On Perturbed Proximal Gradient Algorithms , 2014, J. Mach. Learn. Res..

[40]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.