Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball
暂无分享,去创建一个
Aaron Defazio | Robert M. Gower | Othmane Sebbouh | Robert Mansel Gower | IP Paris | Aaron Defazio | Othmane Sebbouh | I. Paris
[1] Euhanna Ghadimi,et al. Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).
[2] Mert Gürbüzbalaban,et al. Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances , 2019, ICML.
[3] Mark W. Schmidt,et al. Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates , 2019, NeurIPS.
[4] Hedy Attouch,et al. The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than 1/k2 , 2015, SIAM J. Optim..
[5] Frederik Kunstner,et al. Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search) , 2020, ArXiv.
[6] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[7] A. Chambolle,et al. On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..
[8] Peter Richt'arik,et al. Better Theory for SGD in the Nonconvex World , 2020, Trans. Mach. Learn. Res..
[9] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[10] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[11] Volkan Cevher,et al. On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems , 2020, NeurIPS.
[12] Léon Bottou,et al. Stochastic Learning , 2003, Advanced Lectures on Machine Learning.
[13] Aurélien Lucchi,et al. The Role of Memory in Stochastic Optimization , 2019, UAI.
[14] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[15] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[16] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[17] Francesco Orabona,et al. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes , 2018, AISTATS.
[18] Aaron Defazio,et al. On the Curved Geometry of Accelerated Optimization , 2018, NeurIPS.
[19] Sharan Vaswani,et al. Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence , 2020, AISTATS.
[20] Antoine Godichon-Baggioni. $L^{p}$ and almost sure rates of convergence of averaged stochastic gradient algorithms with applications to online robust estimation , 2016 .
[21] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[22] S. Gadat,et al. Stochastic Heavy ball , 2016, 1609.04228.
[23] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[24] Stephen J. Wright,et al. First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems , 2018, ICML.
[25] Robert M. Gower,et al. Optimal mini-batch and step sizes for SAGA , 2019, ICML.
[26] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.
[27] Prateek Jain,et al. Making the Last Iterate of SGD Information Theoretically Optimal , 2019, COLT.
[28] Paul T. Boggs,et al. Sequential Quadratic Programming , 1995, Acta Numerica.
[29] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[30] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[31] H. Robbins. A Stochastic Approximation Method , 1951 .
[32] Peter Richtárik,et al. Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods , 2017, Computational Optimization and Applications.
[33] Stephen P. Boyd,et al. Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.
[34] Marc Teboulle,et al. A fast Iterative Shrinkage-Thresholding Algorithm with application to wavelet-based image deblurring , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] Peter Richtárik,et al. SGD and Hogwild! Convergence Without the Bounded Gradients Assumption , 2018, ICML.
[36] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[37] Peter Richtárik,et al. SGD: General Analysis and Improved Rates , 2019, ICML 2019.