论文信息 - Robustness of Accelerated First-Order Algorithms for Strongly Convex Optimization Problems

Robustness of Accelerated First-Order Algorithms for Strongly Convex Optimization Problems

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-square error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-square deviation from the optimal value that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

[1] O. Devolder,et al. Stochastic first order methods in smooth convex optimization , 2011 .

[2] Asuman E. Ozdaglar,et al. Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions , 2018, SIAM J. Optim..

[3] Florian Dörfler,et al. Quadratic performance of primal-dual methods with application to secondary frequency control of power systems , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[4] M. Baes. Estimate sequence methods: extensions and approximations , 2009 .

[5] Stephen P. Boyd,et al. Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[6] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[7] Mihailo R. Jovanovic,et al. The Proximal Augmented Lagrangian Method for Nonsmooth Composite Optimization , 2016, IEEE Transactions on Automatic Control.

[8] Alexander Gasnikov,et al. Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle , 2016, Journal of Optimization Theory and Applications.

[9] Mark W. Schmidt,et al. Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[10] Bassam Bamieh,et al. Coherence in Large-Scale Networks: Dimension-Dependent Limitations of Local Feedback , 2011, IEEE Transactions on Automatic Control.

[11] John W. Simpson-Porco,et al. Input/output analysis of primal-dual gradient algorithms , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12] H. Robbins. A Stochastic Approximation Method , 1951 .

[13] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[14] Dimitri P. Bertsekas,et al. Convex Optimization Algorithms , 2015 .

[15] Vahid Tarokh,et al. On Optimal Generalizability in Parametric Learning , 2017, NIPS.

[16] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[17] Michael Chertkov,et al. Sparsity-Promoting Optimal Wide-Area Control of Power Networks , 2013, IEEE Transactions on Power Systems.

[18] Michael I. Jordan,et al. Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[19] Simon Michalowsky,et al. Robust and structure exploiting optimisation algorithms: an integral quadratic constraint approach , 2019, Int. J. Control.

[20] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[21] Ali H. Sayed,et al. On the influence of momentum acceleration on online learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22] Mihailo R. Jovanovic,et al. Learning the model-free linear quadratic regulator via random search , 2020, L4DC.

[23] Boris Polyak,et al. Lyapunov Functions: An Optimization Theory Perspective , 2017 .

[24] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[25] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[27] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[28] Mihailo R. Jovanovic,et al. Variance Amplification of Accelerated First-Order Algorithms for Strongly Convex Quadratic Optimization Problems , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[29] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[30] A. Rantzer,et al. System analysis via integral quadratic constraints , 1997, IEEE Trans. Autom. Control..

[31] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[32] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[33] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..