Robustness of Accelerated First-Order Algorithms for Strongly Convex Optimization Problems

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-square error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-square deviation from the optimal value that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

[1]  O. Devolder,et al.  Stochastic first order methods in smooth convex optimization , 2011 .

[2]  Asuman E. Ozdaglar,et al.  Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions , 2018, SIAM J. Optim..

[3]  Florian Dörfler,et al.  Quadratic performance of primal-dual methods with application to secondary frequency control of power systems , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[4]  M. Baes Estimate sequence methods: extensions and approximations , 2009 .

[5]  Stephen P. Boyd,et al.  Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[6]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[7]  Mihailo R. Jovanovic,et al.  The Proximal Augmented Lagrangian Method for Nonsmooth Composite Optimization , 2016, IEEE Transactions on Automatic Control.

[8]  Alexander Gasnikov,et al.  Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle , 2016, Journal of Optimization Theory and Applications.

[9]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[10]  Bassam Bamieh,et al.  Coherence in Large-Scale Networks: Dimension-Dependent Limitations of Local Feedback , 2011, IEEE Transactions on Automatic Control.

[11]  John W. Simpson-Porco,et al.  Input/output analysis of primal-dual gradient algorithms , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  H. Robbins A Stochastic Approximation Method , 1951 .

[13]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[14]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[15]  Vahid Tarokh,et al.  On Optimal Generalizability in Parametric Learning , 2017, NIPS.

[16]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[17]  Michael Chertkov,et al.  Sparsity-Promoting Optimal Wide-Area Control of Power Networks , 2013, IEEE Transactions on Power Systems.

[18]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[19]  Simon Michalowsky,et al.  Robust and structure exploiting optimisation algorithms: an integral quadratic constraint approach , 2019, Int. J. Control.

[20]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[21]  Ali H. Sayed,et al.  On the influence of momentum acceleration on online learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Mihailo R. Jovanovic,et al.  Learning the model-free linear quadratic regulator via random search , 2020, L4DC.

[23]  Boris Polyak,et al.  Lyapunov Functions: An Optimization Theory Perspective , 2017 .

[24]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[27]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[28]  Mihailo R. Jovanovic,et al.  Variance Amplification of Accelerated First-Order Algorithms for Strongly Convex Quadratic Optimization Problems , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[29]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[30]  A. Rantzer,et al.  System analysis via integral quadratic constraints , 1997, IEEE Trans. Autom. Control..

[31]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[32]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[33]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[34]  Asuman E. Ozdaglar,et al.  A Universally Optimal Multistage Accelerated Stochastic Gradient Method , 2019, NeurIPS.

[35]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[36]  Olivier Devolder,et al.  Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization , 2013 .

[37]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[38]  Armin Zare,et al.  Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem , 2019, ArXiv.

[39]  Mahdi Soltanolkotabi,et al.  On the Linear Convergence of Random Search for Discrete-Time LQR , 2021, IEEE Control Systems Letters.

[40]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[41]  Meisam Razaviyayn,et al.  Performance of noisy Nesterov's accelerated method for strongly convex optimization problems , 2019, 2019 American Control Conference (ACC).

[42]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[43]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[44]  Bin Hu,et al.  A Robust Accelerated Optimization Algorithm for Strongly Convex Functions , 2017, 2018 Annual American Control Conference (ACC).

[45]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[46]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[47]  Jean-François Aujol,et al.  Stability of Over-Relaxations for the Forward-Backward Algorithm, Application to FISTA , 2015, SIAM J. Optim..

[48]  Léon Bottou,et al.  On-line learning for very large data sets , 2005 .

[49]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[50]  Huibert Kwakernaak,et al.  Linear Optimal Control Systems , 1972 .

[51]  Fu Lin,et al.  Optimal Control of Vehicular Formations With Nearest Neighbor Interactions , 2011, IEEE Transactions on Automatic Control.

[52]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[53]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[54]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[55]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[56]  Alejandro Ribeiro,et al.  Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..