On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
暂无分享,去创建一个
[1] Julien Mairal,et al. Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[4] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[5] Léon Bottou,et al. On-line learning and stochastic approximations , 1999 .
[6] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[7] Angelia Nedic,et al. On stochastic gradient and subgradient methods with adaptive steplength sequences , 2011, Autom..
[8] Yuan Cao,et al. On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization , 2018, ArXiv.
[9] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .
[10] Stephen P. Boyd,et al. Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.
[11] Zorana Luzanin,et al. Adaptive stochastic approximation algorithm , 2017, Numerical Algorithms.
[12] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[13] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[14] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.
[15] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[16] Francesco Orabona,et al. Scale-free online learning , 2016, Theor. Comput. Sci..
[17] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[18] Alfredo N. Iusem,et al. On the projected subgradient method for nonsmooth convex optimization in a Hilbert space , 1998, Math. Program..
[19] Francesco Orabona,et al. Coin Betting and Parameter-Free Online Learning , 2016, NIPS.
[20] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[22] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[23] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[24] Volkan Cevher,et al. Online Adaptive Methods, Universality and Acceleration , 2018, NeurIPS.
[25] Xiaoxia Wu,et al. WNGrad: Learn the Learning Rate in Gradient Descent , 2018, ArXiv.
[26] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[27] L. Eon Bottou. Online Learning and Stochastic Approximations , 1998 .
[28] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[29] Francesco Orabona,et al. Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.